Treffer: Automatic Library Generation and Performance Tuning for Modular Polynomial Multiplication
Weitere Informationen
Polynomial multiplication is a key algorithm underlying computer algebra systems (CAS) and its efficient implementation is crucial for the performance of CAS. In this context coefficients of polynomials come from domains such as the integers, rationals and finite fields where arithmetic is performed exactly without rounding error. Obtaining peak performance on modern computer architectures is difficult due to the complexity of the memory hierarchy, and the need for efficient parallelism on multiple levels ranging from instruction level parallelism (ILP) and vector parallelism to multi-core parallelism. Tuning the performance of code to fully utilize all of these components requires modifying the implementation to the particular architecture and extensive experimentation to determine the best choices of algorithm and implementation strategies. Moreover, this time consuming process must be repeated whenever the architecture changes, as the particular choices for one machine are not optimal for another, even when relatively minor changes are made in the architecture. In numeric (floating point based) scientific and mathematical libraries where performance is crucial, considerable effort has been made to optimize key routines across multiple architectures. This performance tuning has been aided by the use of automation where many code choices are generated and intelligent search is utilized to find the “best” implementation on a given architecture. The performance of autotuned implementations is comparable to, and in some cases better than, the best hand-tuned code. In this thesis we design and implement algorithms for polynomial multiplication using a fast Fourier transform (FFT) based approach. We improve on the state-of-the-art in both theoretical and practical performance. The SPIRAL library generation system is used to automatically generate and tune the performance of a polynomial multiplication library that is optimized for memory hierarchy, vectorization and multi-threading, using new and existing ...