Treffer: Data management for distributed computational workflows: An iRODS-based setup and its performance.

Title:
Data management for distributed computational workflows: An iRODS-based setup and its performance.
Authors:
Hayek M; Leibniz Supercomputing Centre (LRZ), Bavarian Academy of Sciences and Humanities, Garching near Munich, Germany., Golasowski M; IT4Innovations National Supercomputing Center (IT4I), VŠB - Technical University of Ostrava, Ostrava, Czech Republic., Hachinger S; Leibniz Supercomputing Centre (LRZ), Bavarian Academy of Sciences and Humanities, Garching near Munich, Germany., García-Hernández RJ; Leibniz Supercomputing Centre (LRZ), Bavarian Academy of Sciences and Humanities, Garching near Munich, Germany.; MNM-Team, Ludwig-Maximilians-Universität (LMU) München, Munich, Germany., Munke J; Leibniz Supercomputing Centre (LRZ), Bavarian Academy of Sciences and Humanities, Garching near Munich, Germany., Lindner G; MNM-Team, Ludwig-Maximilians-Universität (LMU) München, Munich, Germany., Slaninová K; IT4Innovations National Supercomputing Center (IT4I), VŠB - Technical University of Ostrava, Ostrava, Czech Republic., Tunka P; Leibniz Supercomputing Centre (LRZ), Bavarian Academy of Sciences and Humanities, Garching near Munich, Germany., Vondrák V; IT4Innovations National Supercomputing Center (IT4I), VŠB - Technical University of Ostrava, Ostrava, Czech Republic., Kranzlmüller D; Leibniz Supercomputing Centre (LRZ), Bavarian Academy of Sciences and Humanities, Garching near Munich, Germany.; MNM-Team, Ludwig-Maximilians-Universität (LMU) München, Munich, Germany., Martinovič J; IT4Innovations National Supercomputing Center (IT4I), VŠB - Technical University of Ostrava, Ostrava, Czech Republic.
Source:
PloS one [PLoS One] 2026 Jan 12; Vol. 21 (1), pp. e0340757. Date of Electronic Publication: 2026 Jan 12 (Print Publication: 2026).
Publication Type:
Journal Article
Language:
English
Journal Info:
Publisher: Public Library of Science Country of Publication: United States NLM ID: 101285081 Publication Model: eCollection Cited Medium: Internet ISSN: 1932-6203 (Electronic) Linking ISSN: 19326203 NLM ISO Abbreviation: PLoS One Subsets: MEDLINE
Imprint Name(s):
Original Publication: San Francisco, CA : Public Library of Science
References:
Sci Data. 2016 Mar 15;3:160018. (PMID: 26978244)
Open Res Eur. 2024 Jul 9;4:136. (PMID: 39219788)
J Open Source Softw. 2021;6(63):. (PMID: 39469147)
Entry Date(s):
Date Created: 20260112 Date Completed: 20260112 Latest Revision: 20260115
Update Code:
20260115
PubMed Central ID:
PMC12795369
DOI:
10.1371/journal.pone.0340757
PMID:
41525253
Database:
MEDLINE

Weitere Informationen

Modern data-management frameworks promise a flexible and efficient management of data and metadata across storage backends. However, such claims need to be put to a meaningful test in daily practice. We conjecture that such frameworks should be fit to construct a data backend for workflows which use geographically distributed high-performance and cloud computing systems. Cross-site data transfers within such a backend should largely saturate network bandwidth, in particular when parameters such as buffer sizes are optimized. To explore this further, we evaluate the "integrated Rule-Oriented Data System" iRODS with EUDAT's B2SAFE module as data backend for the "Distributed Data Infrastructure" within the LEXIS Platform for complex computing workflow orchestration and distributed data management. The focus of our study is on testing our conjectures-i.e., on construction and assessment of the data infrastructure and on measurements of data-transfer performance over the wide-area network between two selected supercomputing sites connected to LEXIS. We analyze limitations and identify optimization opportunities. Efficient utilization of the available network bandwidth is possible and depends on suitable client configuration and file size. Our work shows that systems such as iRODS nowadays fit the requirements for integration in federated computing infrastructures involving web-based authentication flows with OpenID Connect and rich on-line services. We are continuing to exploit these properties in the EXA4MIND project, where we aim at optimizing data-heavy workflows, integrating various systems for managing structured and unstructured data.
(Copyright: © 2026 Mohamad Hayek et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)

Financial: This work has been co-funded by the EU’s Horizon 2020 Research and Innovation Programme (2014–2020) under grant agreement N∘ 825532 (Project LEXIS – “Large-scale EXecution for Industry and Society”). Furthermore, this work received support from the EXA4MIND project (“EXtreme Analytics for MINing Data spaces”), funded by the European Union’s Horizon Europe Research and Innovation Programme, under Grant Agreement N∘ 101092944. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Commission. Neither the European Union nor the granting authority can be held responsible for them. This work has also received significant support from the project grant ICBxBCI for Czech-Bavarian collaboration and researcher mobility of the Bavarian State Chancellery (Bayerische Staatskanzlei). This work was supported by the Ministry of Education, Youth and Sports of the Czech Republic through the eINFRA CZ (ID:90254). Non-Financial: IT4Innovations is part of the iRODS Consortium and of the EUDAT CDI Council. The decision towards these participations has been influenced by the use of these frameworks in LEXIS, not the other way around. The measurements presented have been conducted before the iRODS Consortium membership. All this does not alter our adherence to PLOS ONE policies on sharing data and materials.