Treffer: Programming parallel dense matrix factorizations and inversion for new-generation NUMA architectures

Title:

Programming parallel dense matrix factorizations and inversion for new-generation NUMA architectures

Authors:

Catalán Pallarés, Sandra, Igual Peña, Francisco D., Herrero Zaragoza, José Ramón, Rodríguez Sánchez, Rafael, Quintana Ortí, Enrique Salvador

Contributors:

Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. PM - Programming Models

Publisher Information:

Elsevier

Publication Year:

2023

Collection:

Universitat Politècnica de Catalunya, BarcelonaTech: UPCommons - Global access to UPC knowledge

Subject Terms:

Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures paral·leles, Memory management (Computer science), Parallel programming (Computer science), NUMA architectures, Chiplets, Dense linear algebra, Shared memory programming, Portability, Gestió de memòria (Informàtica), Programació en paral·lel (Informàtica)

Document Type:

Fachzeitschrift article in journal/newspaper

File Description:

15 p.; application/pdf

Language:

English

Relation:

https://www.sciencedirect.com/science/article/pii/S0743731523000047; info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-107255GB-C22/ES/UPC-COMPUTACION DE ALTAS PRESTACIONES VIII/; http://hdl.handle.net/2117/386040

DOI:

10.1016/j.jpdc.2023.01.004

Availability:

http://hdl.handle.net/2117/386040
https://doi.org/10.1016/j.jpdc.2023.01.004

Rights:

Attribution-NonCommercial-NoDerivatives 4.0 International ; http://creativecommons.org/licenses/by-nc-nd/4.0/ ; Open Access

Accession Number:

edsbas.A5A12057

Database:

BASE

Weitere Informationen

We propose a methodology to address the programmability issues derived from the emergence of new-generation shared-memory NUMA architectures. For this purpose, we employ dense matrix factorizations and matrix inversion (DMFI) as a use case, and we target two modern architectures (AMD Rome and Huawei Kunpeng 920) that exhibit configurable NUMA topologies. Our methodology pursues performance portability across different NUMA configurations by proposing multi-domain implementations for DMFI plus a hybrid task- and loop-level parallelization that configures multi-threaded executions to fix core-to-data binding, exploiting locality at the expense of minor code modifications. In addition, we introduce a generalization of the multi-domain implementations for DMFI that offers support for virtually any NUMA topology in present and future architectures. Our experimentation on the two target architectures for three representative dense linear algebra operations validates the proposal, reveals insights on the necessity of adapting both the codes and their execution to improve data access locality, and reports performance across architectures and inter- and intra-socket NUMA configurations competitive with state-of-the-art message-passing implementations, maintaining the ease of development usually associated with shared-memory programming. ; This research was sponsored by project PID2019-107255GB of Ministerio de Ciencia, Innovación y Universidades; project S2018/TCS-4423 of Comunidad de Madrid; project 2017-SGR-1414 of the Generalitat de Catalunya and the Madrid Government under the Multiannual Agreement with UCM in the line Program to Stimulate Research for Young Doctors in the context of the V PRICIT, project PR65/19-22445. This project has also received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 955558. The JU receives support from the European Union’s Horizon 2020 research and innovation programme, and Spain, Germany, France, Italy, Poland, Switzerland, Norway. ...

Treffer: Programming parallel dense matrix factorizations and inversion for new-generation NUMA architectures

Weitere Informationen

Links

Zusatz-Funktionen