Treffer: Parallelizing autotuning for HPC applications: Unveiling the potential of the speculation strategy in Bayesian optimization

Title:
Parallelizing autotuning for HPC applications: Unveiling the potential of the speculation strategy in Bayesian optimization
Source:
The International Journal of High Performance Computing Applications, vol 39, iss 5
Publisher Information:
eScholarship, University of California
Publication Year:
2025
Collection:
University of California: eScholarship
Subject Geographic:
Document Type:
Fachzeitschrift article in journal/newspaper
File Description:
application/pdf
Language:
unknown
DOI:
10.1177/10943420251362001
Rights:
CC-BY
Accession Number:
edsbas.A770816F
Database:
BASE

Weitere Informationen

In the exascale computing era, tuning High-Performance Computing (HPC) applications has become a significant computational challenge. Although Bayesian optimization (BO) has emerged as a promising tool for HPC performance tuning, the BO workflow is inherently sequential (i.e., one function evaluation at a time) and cannot leverage the huge amount of parallel resources present in modern supercomputers, resulting in a considerable underutilization of their computational capabilities. This paper explores the trade-off between search quality and parallelism in BO, investigating a diverse set of methods. Building upon both previous approaches from the literature and novel methodologies introduced in this work, our study provides a deep analysis to accelerate BO performance tuning. By examining a set of synthetic functions and practical HPC applications, our exploration analyzes the interaction among various BO methods for parallelization, the quantity of parallel resources, the runtime distribution of target HPC applications, and the costs associated with different search orchestration mechanisms that have been overlooked in previous studies. Compared to sequential BO, our novel methodology achieves comparable quality while demonstrating robust scalability in search time as the amount of parallel resources increases; it also outperforms a state-of-the-art tuner, which supports parallelization, achieving up to 3.67x faster search time. We provide high-value insights for practitioners seeking to leverage the power of parallel computing for efficient HPC application tuning. Additionally, to further assist researchers in accelerating the performance tuning of their HPC applications, we provide an extension of an existing open-source tuning framework that incorporates our methods.