Treffer: Boosting HPC data analysis performance with the ParSoDA-Py library

Title:

Boosting HPC data analysis performance with the ParSoDA-Py library

Authors:

Belcastro, Loris, Giampà, Salvatore, Marozzo, Fabrizio, Talia, Domenico, Trunfio, Paolo, Badia Sala, Rosa Maria, Ejarque, Jorge, Mammadli, Nihad

Contributors:

Barcelona Supercomputing Center

Publisher Information:

Springer

Publication Year:

2024

Collection:

Universitat Politècnica de Catalunya, BarcelonaTech: UPCommons - Global access to UPC knowledge

Subject Terms:

Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors, Big data, Big data analysis, Parallel computing, HPDA, PyCOMPSs, Spark, HPC, Supercomputadors

Document Type:

Fachzeitschrift article in journal/newspaper

File Description:

application/pdf

Language:

English

Relation:

https://link.springer.com/article/10.1007/s11227-023-05883-z; info:eu-repo/grantAgreement/EC/H2020/955558/EU/Enabling dynamic and Intelligent workflows in the future EuroHPCecosystem/eFlows4HPC; info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PCI2021-121957/ES/ENABLING DYNAMIC AND INTELLIGENT WORKFLOWS IN THE FUTURE EUROHPCECOSYSTEM/; info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-107255GB-C22/ES/UPC-COMPUTACION DE ALTAS PRESTACIONES VIII/; info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-107255GB-C21/ES/BSC - COMPUTACION DE ALTAS PRESTACIONES VIII/; http://hdl.handle.net/2117/404582

DOI:

10.1007/s11227-023-05883-z

Availability:

http://hdl.handle.net/2117/404582
https://doi.org/10.1007/s11227-023-05883-z

Rights:

Attribution 4.0 International ; http://creativecommons.org/licenses/by/4.0/ ; Open Access

Accession Number:

edsbas.66AD7F6E

Database:

BASE

Weitere Informationen

Developing and executing large-scale data analysis applications in parallel and distributed environments can be a complex and time-consuming task. Developers often find themselves diverted from their application logic to handle technical details about the underlying runtime and related issues. To simplify this process, ParSoDA, a Java library, has been proposed to facilitate the development of parallel data mining applications executed on HPC systems. It simplifies the process by providing built-in scalability mechanisms relying on the Hadoop and Spark frameworks. This paper presents ParSoDA-Py, the Python version of the ParSoDA library, which allows for further support of commonly used runtimes and libraries for big data analysis. After a complete library redesign, ParSoDA can be now easily integrated with other Python-based distributed runtimes for HPC systems, such as COMPSs and Apache Spark, and with the large ecosystem of Python-based data processing libraries. The paper discusses the adaptation process, which takes into consideration the new technical requirements, and evaluates both usability and scalability through some case study applications. ; This work has been partially funded by the European Commission’s Horizon 2020 Framework program and the European High-Performance Computing Joint Undertaking (JU) under Grant agreement No 955558 and by MCIN/AEI/10.13039/501100011033 and the European Union NextGenerationEU/PRTR (PCI2021-121957), project eFlows4HPC. It has also been supported by the Spanish Government (PID2019-107255GB) and by the Departament de Recerca i Universitats de la Generalitat de Catalunya to the Research Group MPiEDist (2021 SGR 00412) We also acknowledge financial support from “National Centre for HPC, Big Data and Quantum Computing," CN00000013 - CUP H23C22000360005, and from “FAIR - Future Artificial Intelligence Research" Project - CUP H23C22000860006. ; Peer Reviewed ; Postprint (published version)

Treffer: Boosting HPC data analysis performance with the ParSoDA-Py library

Weitere Informationen

Links

Zusatz-Funktionen