Treffer: Using Simulation to Understand the Data Layout of Programs
Weitere Informationen
One of the most prominent performance issues on NUMA systems is the access latency to remote memories, which can be several orders of magnitude higher than the one of local memory accesses. Effective data allocation that limits the necessity to access remote memories therefore has the potential to significantly improve the performance of applications. This paper presents a tool that simulates the parallel execution of shared memory programs and provides extensive and detailed information about their run-time data layout. This information allows users to analyze an application 's memory access behavior and to specify an optimized data placement within the source codes resulting in a minimum of remote accesses at run-time. Using this simulation tool, a speedup improvement of up to 145.8% for numerical kernels has been achieved, demonstrating the potentials of such optimizations. KEY WORDS Execution Simulation, Shared Memory Programming, Data Locality 1.