Conference proceeding
Automated performance tuning
Proceedings of the 4th International Workshop on parallel and symbolic computation, pp 20-21
21 Jul 2010
Abstract
This tutorial presents automated techniques for implementing and optimizing numeric and symbolic libraries on modern computing platforms including SSE, multicore, and GPU. Obtaining high performance requires effective use of the memory hierarchy, short vector instructions, and multiple cores. Highly tuned implementations are difficult to obtain and are platform dependent. For example, Intel Core i7 980 XE has a peak floating point performance of over 100 GFLOPS and the NVIDIA Tesla C870 has a peak floating point performance of over 500 GFLOPS, however, achieving close to peak performance on such platforms is extremely difficult. Consequently, automated techniques are now being used to tune and adapt high performance libraries such as ATLAS (math-atlas.sourceforge.net), PLASMA (icl.cs.utk.edu/plasma) and MAGMA (icl.cs.utk.edu/magma) for dense linear algebra, OSKI (bebop.cs.berkeley.edu/oski) for sparse linear algebra, FFTW (www.fftw.org) for the fast Fourier transform (FFT), and SPIRAL (www.spiral.net) for wide class of digital signal processing (DSP) algorithms. Intel currently uses SPIRAL to generate parts of their MKL and IPP libraries.
Metrics
2 Record Views
1 citations in Scopus
Details
- Title
- Automated performance tuning
- Creators
- Jeremy Johnson - Drexel University
- Publication Details
- Proceedings of the 4th International Workshop on parallel and symbolic computation, pp 20-21
- Conference
- 4th International Workshop on parallel and symbolic computation, 4th
- Series
- PASCO '10
- Publisher
- Association for Computing Machinery (ACM)
- Number of pages
- 1
- Resource Type
- Conference proceeding
- Language
- English
- Academic Unit
- Computer Science
- Scopus ID
- 2-s2.0-77956255147
- Other Identifier
- 991019174740004721