Treffer: Accelerating many-core, heterogeneous, and distributed architectures with hardware runtimes and programming models

Title:

Accelerating many-core, heterogeneous, and distributed architectures with hardware runtimes and programming models

Authors:

Contributors:

Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Álvarez Martínez, Carlos, Jiménez González, Daniel

Publisher Information:

Universitat Politècnica de Catalunya

Publication Year:

2025

Collection:

Universitat Politècnica de Catalunya, BarcelonaTech: UPCommons - Global access to UPC knowledge

Subject Terms:

Àrees temàtiques de la UPC::Informàtica, High Performance Computing (HPC), Field-Programmable Gate Array (PFGA), task scheduling, task-based programming models, computer architecture, CPU, MPI, High-Level Synthesis (HLS), energy efficiency, programmability, many-core architectures, FPGA clusters, hardware runtimes, hardware acceleration, ASIC, Implicit Message Passing (IMP), 004 - Informàtica

Document Type:

Dissertation doctoral or postdoctoral thesis

File Description:

220 p.; application/pdf

Language:

English

Relation:

https://hdl.handle.net/2117/442722; https://hdl.handle.net/10803/695347

DOI:

10.5821/dissertation-2117-442722

Availability:

https://hdl.handle.net/2117/442722
https://hdl.handle.net/10803/695347
https://doi.org/10.5821/dissertation-2117-442722

Rights:

http://creativecommons.org/licenses/by/4.0/ ; Open Access ; Attribution 4.0 International

Accession Number:

edsbas.4F2DBA96

Database:

BASE

Weitere Informationen

(English) Due to increasing concern about energy efficiency and the current trend to scale out HPC systems to many computing nodes, this thesis tries to tackle both problems with the help of hardware acceleration and programming models. Regarding the first topic, FPGAs have been the target of study due to their high flexibility to adapt to any computing workload and due to their high energy efficiency. We present extensions to the OmpSs@FPGA framework, which provides a high-level task-based programming interface to non-FPGA experts. These extensions include compiler directives to automatically optimize FPGA code, a hardware task scheduling runtime with dependence analysis called POM, and a multi-FPGA MPI-like API and runtime, called OMPIF. In addition, we present the Implicit Message Passing (IMP) model, which combines task-based and message-passing programming models, leveraging dependence information and a static data distribution. IMP automatically communicates data between nodes when required by the data dependencies of a task. Therefore, the user does not need to write any call to MPI or OMPIF in the code, as this is handled by IMP. We evaluate this model on both FPGA and CPU clusters, with hardware acceleration for task scheduling and message passing using the POM and OMPIF runtimes. For CPU clusters, we study several ways to incorporate POM into an SoC, first with an embedded FPGA, then we design it as an ASIC for a RISC-V core, and finally in an FPGA softcore also based on RISC-V. In the last case, we use both POM and OMPIF to evaluate distributed applications with a cluster of FPGAs that emulate a CPU cluster. We evaluate IMP and regular MPI+tasks programming with several benchmarks: Matrix Multiply, Spectra, N-body, Heat, and Cholesky. With the mentioned contributions, we achieve several objectives. First, we demonstrate that with OmpSs@FPGA we can achieve similar absolute performance to a CPU node for some benchmarks, like N-body, and outperform in energy efficiency to similar CPU and GPU ...

Treffer: Accelerating many-core, heterogeneous, and distributed architectures with hardware runtimes and programming models

Weitere Informationen

Links

Zusatz-Funktionen