Treffer: Enabling HW-based task scheduling in large multicore architectures

Title:
Enabling HW-based task scheduling in large multicore architectures
Contributors:
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. PM - Programming Models
Publisher Information:
Institute of Electrical and Electronics Engineers (IEEE)
Publication Year:
2024
Collection:
Universitat Politècnica de Catalunya, BarcelonaTech: UPCommons - Global access to UPC knowledge
Document Type:
Fachzeitschrift article in journal/newspaper
File Description:
14 p.; application/pdf
Language:
English
Relation:
https://ieeexplore.ieee.org/document/10284544; info:eu-repo/grantAgreement/EC/H2020/956831/EU/Towards EXtreme scale Technologies and Accelerators for euROhpc hw%2FSw Supercomputing Applications for exascale/TEXTAROSSA; https://hdl.handle.net/2117/395689
DOI:
10.1109/TC.2023.3323781
Rights:
http://creativecommons.org/licenses/by-nc-nd/4.0/ ; Open Access ; Attribution-NonCommercial-NoDerivatives 4.0 International
Accession Number:
edsbas.5B902D13
Database:
BASE

Weitere Informationen

Dynamic Task Scheduling is an enticing programming model aiming to ease the development of parallel programs with intrinsically irregular or data-dependent parallelism. The performance of such solutions relies on the ability of the Task Scheduling HW/SW stack to efficiently evaluate dependencies at runtime and schedule work to available cores. Traditional SW-only systems implicate scheduling overheads of around 30K processor cycles per task, which severely limit the ( core count , task granularity ) combinations that they might adequately handle. Previous work on HW-accelerated Task Scheduling has shown that such systems might support high performance scheduling on processors with up to eight cores, but questions remained regarding the viability of such solutions to support the greater number of cores now frequently found in high-end SMP systems. The present work presents an FPGA-proven, tightly-integrated, Linux-capable, 30-core RISC-V system with hardware accelerated Task Scheduling. We use this implementation to show that HW Task Scheduling can still offer competitive performance at such high core count, and describe how this organization includes hardware and software optimizations that make it even more scalable than previous solutions. Finally, we outline ways in which this architecture could be augmented to overcome inter-core communication bottlenecks, mitigating the cache-degradation effects usually involved in the parallelization of highly optimized serial code. ; This work is supported by the TEXTAROSSA project G.A. n.956831, as part of the EuroHPC initiative, by the Spanish Government (grants PCI2021-121964, TEXTAROSSA; PDC2022-133323-I00, Multi-Ka; PID2019-107255GB-C21 MCIN/AEI/10.13039/501100011033; and CEX2021-001148-S), by Generalitat de Catalunya (2021 SGR 01007), and FAPESP (grant 2019/26702-8). ; Peer Reviewed ; Postprint (published version)