Treffer: F-DATA: A Fugaku Workload Dataset for Job-centric Predictive Modelling in HPC Systems.
Weitere Informationen
In the last decades, High Performance Computing (HPC) systems have accelerated scientific discoveries and innovations across different domains, from epidemic studies to climate science. For sustainable development of HPC systems, it is fundamental to address their environmental impact regarding carbon footprint emission and energy requirement, while ensuring high system throughput. Analyzing and predicting HPC job execution characteristics is instrumental in developing workload management strategies to simultaneously optimize the system throughput and minimize the environmental impact. However, model development for accurate predictions is hindered by lack of voluminous public datasets. In this paper, we present F-DATA, a public dataset containing the information of around 24 million jobs executed on Fugaku, the most powerful supercomputer during the data collection phase. The data contains an extensive set of features, allowing for a multitude of job characteristics prediction. The sensitive job data appears both in anonymized and irreversibly encoded versions. The encoding is based on a Natural Language Processing model and retains sensitive but useful job information for prediction purposes without violating privacy concerns. [ABSTRACT FROM AUTHOR]
Copyright of Scientific Data is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)