Treffer: Taskgraph: a low contention OpenMP tasking framework

Title:
Taskgraph: a low contention OpenMP tasking framework
Contributors:
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center
Publication Year:
2023
Collection:
Universitat Politècnica de Catalunya, BarcelonaTech: UPCommons - Global access to UPC knowledge
Document Type:
Fachzeitschrift article in journal/newspaper
File Description:
12 p.; application/pdf
Language:
English
Relation:
https://ieeexplore.ieee.org/document/10146446; nfo:eu-repo/grantAgreement/EC/H2020/871669/EU/A Model-driven development framework for highly Parallel and EneRgy-Efficient computation supporting multi-criteria optimisation/AMPERE; info:eu-repo/grantAgreement/EC/H2020/871669/EU/A Model-driven development framework for highly Parallel and EneRgy-Efficient computation supporting multi-criteria optimisation/AMPERE; http://hdl.handle.net/2117/397368
DOI:
10.1109/TPDS.2023.3284219
Rights:
Open Access
Accession Number:
edsbas.1ED63A6B
Database:
BASE

Weitere Informationen

OpenMP is the de-facto standard for shared memory systems in High-Performance Computing (HPC). It includes a tasking model that offers a high-level of abstraction to effectively exploit structured (loop-based) and highly dynamic unstructured (task-based) parallelism in an easy and flexible way. Unfortunately, the run-time overheads introduced to manage tasks are (very) high in most common OpenMP frameworks (e.g., GCC, LLVM), which defeats the potential benefits of the tasking model, and makes it suitable for coarse-grained tasks only. This paper presents taskgraph , a framework that uses a task dependency graph (TDG) to represent a region of code implemented with OpenMP tasks in order to reduce the run-time overheads associated with the management of tasks, i.e., contention and parallel orchestration, including task creation and synchronization. The TDG avoids the overheads related to the resolution of task dependencies and greatly reduces those deriving from accesses to shared resources. Moreover, the taskgraph framework introduces in OpenMP the record-and-replay execution model that accelerates the taskgraph region from its second execution. Overall, the multiple optimizations presented in this paper allow exploiting fine-grained OpenMP tasks to cope with the trend in current applications pointing to leverage massive on-node parallelism, fine-grained and dynamic scheduling paradigms. The framework is implemented on LLVM 15.0. Results show that the taskgraph implementation outperforms the vanilla OpenMP system in terms of performance and scalability, for all structured and unstructured parallelism, and considering coarse and fine grained tasks. Furthermore, the proposed framework makes the tasking model a competitive alternative to the OpenMP thread model in most cases. ; This work was supported in part by the Generalitat de Catalunya project RESPECT under Grant 2021 PROD 00179 and in part by the EU H2020 project AMPERE under Grant 871669, ; Peer Reviewed ; Postprint (author's final draft)