Treffer: Data management for distributed computational workflows: An iRODS-based setup and its performance.
Open Res Eur. 2024 Jul 9;4:136. (PMID: 39219788)
J Open Source Softw. 2021;6(63):. (PMID: 39469147)
Weitere Informationen
Modern data-management frameworks promise a flexible and efficient management of data and metadata across storage backends. However, such claims need to be put to a meaningful test in daily practice. We conjecture that such frameworks should be fit to construct a data backend for workflows which use geographically distributed high-performance and cloud computing systems. Cross-site data transfers within such a backend should largely saturate network bandwidth, in particular when parameters such as buffer sizes are optimized. To explore this further, we evaluate the "integrated Rule-Oriented Data System" iRODS with EUDAT's B2SAFE module as data backend for the "Distributed Data Infrastructure" within the LEXIS Platform for complex computing workflow orchestration and distributed data management. The focus of our study is on testing our conjectures-i.e., on construction and assessment of the data infrastructure and on measurements of data-transfer performance over the wide-area network between two selected supercomputing sites connected to LEXIS. We analyze limitations and identify optimization opportunities. Efficient utilization of the available network bandwidth is possible and depends on suitable client configuration and file size. Our work shows that systems such as iRODS nowadays fit the requirements for integration in federated computing infrastructures involving web-based authentication flows with OpenID Connect and rich on-line services. We are continuing to exploit these properties in the EXA4MIND project, where we aim at optimizing data-heavy workflows, integrating various systems for managing structured and unstructured data.
(Copyright: © 2026 Mohamad Hayek et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Financial: This work has been co-funded by the EU’s Horizon 2020 Research and Innovation Programme (2014–2020) under grant agreement N∘ 825532 (Project LEXIS – “Large-scale EXecution for Industry and Society”). Furthermore, this work received support from the EXA4MIND project (“EXtreme Analytics for MINing Data spaces”), funded by the European Union’s Horizon Europe Research and Innovation Programme, under Grant Agreement N∘ 101092944. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Commission. Neither the European Union nor the granting authority can be held responsible for them. This work has also received significant support from the project grant ICBxBCI for Czech-Bavarian collaboration and researcher mobility of the Bavarian State Chancellery (Bayerische Staatskanzlei). This work was supported by the Ministry of Education, Youth and Sports of the Czech Republic through the eINFRA CZ (ID:90254). Non-Financial: IT4Innovations is part of the iRODS Consortium and of the EUDAT CDI Council. The decision towards these participations has been influenced by the use of these frameworks in LEXIS, not the other way around. The measurements presented have been conducted before the iRODS Consortium membership. All this does not alter our adherence to PLOS ONE policies on sharing data and materials.