Treffer: Boosting HPC data analysis performance with the ParSoDA-Py library

Title:
Boosting HPC data analysis performance with the ParSoDA-Py library
Publisher Information:
Springer 2024-02
Document Type:
E-Ressource Electronic Resource
Availability:
Open access content. Open access content
http://creativecommons.org/licenses/by/4.0
Open Access
Attribution 4.0 International
Note:
application/pdf
English
Other Numbers:
HGF oai:upcommons.upc.edu:2117/404582
Belcastro, L. [et al.]. Boosting HPC data analysis performance with the ParSoDA-Py library. "Journal of Supercomputing", Febrer 2024,
0920-8542
1573-0484
10.1007/s11227-023-05883-z
1427132227
Contributing Source:
UNIV POLITECNICA DE CATALUNYA
From OAIster®, provided by the OCLC Cooperative.
Accession Number:
edsoai.on1427132227
Database:
OAIster

Weitere Informationen

Developing and executing large-scale data analysis applications in parallel and distributed environments can be a complex and time-consuming task. Developers often find themselves diverted from their application logic to handle technical details about the underlying runtime and related issues. To simplify this process, ParSoDA, a Java library, has been proposed to facilitate the development of parallel data mining applications executed on HPC systems. It simplifies the process by providing built-in scalability mechanisms relying on the Hadoop and Spark frameworks. This paper presents ParSoDA-Py, the Python version of the ParSoDA library, which allows for further support of commonly used runtimes and libraries for big data analysis. After a complete library redesign, ParSoDA can be now easily integrated with other Python-based distributed runtimes for HPC systems, such as COMPSs and Apache Spark, and with the large ecosystem of Python-based data processing libraries. The paper discusses the adaptation process, which takes into consideration the new technical requirements, and evaluates both usability and scalability through some case study applications.
This work has been partially funded by the European Commission’s Horizon 2020 Framework program and the European High-Performance Computing Joint Undertaking (JU) under Grant agreement No 955558 and by MCIN/AEI/10.13039/501100011033 and the European Union NextGenerationEU/PRTR (PCI2021-121957), project eFlows4HPC. It has also been supported by the Spanish Government (PID2019-107255GB) and by the Departament de Recerca i Universitats de la Generalitat de Catalunya to the Research Group MPiEDist (2021 SGR 00412) We also acknowledge financial support from “National Centre for HPC, Big Data and Quantum Computing," CN00000013 - CUP H23C22000360005, and from “FAIR - Future Artificial Intelligence Research" Project - CUP H23C22000860006.
Peer Reviewed
Postprint (published version)