Treffer: A symbolic emulator for shuffle synthesis on the NVIDIA PTX code

Title:

A symbolic emulator for shuffle synthesis on the NVIDIA PTX code

Authors:

Matsumura, Kazuaki, García de Gonzalo, Simón, Peña Monferrer, Antonio José

Contributors:

Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center

Publisher Information:

Association for Computing Machinery (ACM)

Publication Year:

2023

Collection:

Universitat Politècnica de Catalunya, BarcelonaTech: UPCommons - Global access to UPC knowledge

Subject Terms:

Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors, Graphics processing units, Parallel processing (Electronic computers), Compilers (Computer programs), Compiler, Symbolic analysis, Code generation, GPUs, NVIDIA PTX, Program optimization, Unitats de processament gràfic, Processament en paral·lel (Ordinadors), Compiladors (Programes d'ordinador)

Document Type:

Konferenz conference object

File Description:

12 p.; application/pdf

Language:

English

Relation:

info:eu-repo/grantAgreement/EC/H2020/801051/EU/European joint Effort toward a Highly Productive Programming Environment for Heterogeneous Exascale Computing (EPEEC)/EPEEC; info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-107255GB-C21/ES/BSC - COMPUTACION DE ALTAS PRESTACIONES VIII/; http://hdl.handle.net/2117/384604

DOI:

10.1145/3578360.3580253

Availability:

http://hdl.handle.net/2117/384604
https://doi.org/10.1145/3578360.3580253

Rights:

Attribution 4.0 International ; http://creativecommons.org/licenses/by/4.0/ ; Open Access

Accession Number:

edsbas.68249573

Database:

BASE

Weitere Informationen

Various kinds of applications take advantage of GPUs through automation tools that attempt to automatically exploit the available performance of the GPU's parallel architecture. Directive-based programming models, such as OpenACC, are one such method that easily enables parallel computing by just adhering code annotations to code loops. Such abstract models, however, often prevent programmers from making additional low-level optimizations to take advantage of the advanced architectural features of GPUs because the actual generated computation is hidden from the application developer. This paper describes and implements a novel flexible optimization technique that operates by inserting a code emulator phase to the tail-end of the compilation pipeline. Our tool emulates the generated code using symbolic analysis by substituting dynamic information and thus allowing for further low-level code optimizations to be applied. We implement our tool to support both CUDA and OpenACC directives as the frontend of the compilation pipeline, thus enabling low-level GPU optimizations for OpenACC that were not previously possible. We demonstrate the capabilities of our tool by automating warp-level shuffle instructions that are difficult to use by even advanced GPU programmers. Lastly, evaluating our tool with a benchmark suite and complex application code, we provide a detailed study to assess the benefits of shuffle instructions across four generations of GPU architectures. ; We are funded by the EPEEC project from the European Union’s Horizon 2020 research and innovation program under grant agreement No. 801051 and the Ministerio de Ciencia e Innovación-Agencia Estatal de Investigación (PID2019-107255GB-C21/AEI/10.13039/501100011033). This work has been partially carried out on the ACME cluster owned by CIEMAT and funded by the Spanish Ministry of Economy and Competitiveness project CODEC-OSE (RTI2018-096006-B-I00). ; Peer Reviewed ; Postprint (published version)

Treffer: A symbolic emulator for shuffle synthesis on the NVIDIA PTX code

Weitere Informationen

Links

Zusatz-Funktionen