Treffer: Modern vector architectures for high-performance computing
Weitere Informationen
Recent generations of general-purpose central processing units (CPUs) for the high-performance segment have had to adopt new approaches in order to deliver increasing performance. Clock frequency has increased little, but the number of cores per chip has increased by several times in a single decade. Inside each core, single instruction, multiple data (SIMD) capabilities have also increased in capacity, resulting in modern vector processors that can achieve peak performance close to that of graphics processing units (GPUs), while maintaining the versatility of a general-purpose processor. These in- creases in compute power, however, have not been met with similar advances in memory performance. These architectural changes have coincided with another change in the High-Performance Computing (HPC) landscape: Arm-based processor designs have made their way into supercomputer systems alongside commodity x86 processors. These designs have come in the form of custom implementations from several vendors, and they aim to address deficiencies in both compute and memory performance for the HPC environment. Arm's implementation of wide SIMD is called the Scalable Vector Extension (SVE), and it represents a modern implementation of ideas first seen in the vector architectures of the original Cray supercomputers of the 1970s. For memory bandwidth, the novelty of these Arm-based designs lies in a significant increase in the number of memory channels available, and even in bringing high-bandwidth memory from GPUs to CPUs. This thesis is a study of modern CPU architectures for HPC. The focus of this research is on the efficacy of the vector capabilities in these new processors, which it investigates from the twin perspectives of performance and programmability. The initial experiments are performed in the context of the first Arm-based hardware adopted in HPC, building up to experiments in simulated and emulated environments on the challenges faced by a wide vector instruction set like SVE, and finally analysing the real-world performance of the first implementation of SVE in hardware. The thesis concludes with an outlook towards the next generations of high-performance processors, highlighting the need for co-design in the quest for performance, and suggesting future research avenues for a new generation of performance tools that can enable informed design decisions for upcoming hardware.