Logo image
High performance implementation of the TFT
Conference proceeding

High performance implementation of the TFT

Lingchuan Meng, Jeremy Johnson and Jamila Danielle Johnson
Proceedings of the 39th International Symposium on symbolic and algebraic computation, pp 328-334
23 Jul 2014

Abstract

automatic performance tuning parallelization truncated fourier transform vectorization
This paper reports on a high-performance implementation of the truncated Fourier transform (TFT). A general Cooley-Tukey like algorithm for the TFT is developed that allows the implementation to automatically adapt to the memory hierarchy. Then the algorithm introduces a small relaxation for larger transform sizes which trades off slightly higher arithmetic cost for improved data flow which allows full vectorization and parallelization. The implementation is automatically derived and tuned using the SPIRAL system for code generation and adaptation. The resulting arbitrary-size TFT library smooths out the staircase performance associated with power-of-two modular FFT implementations while retaining the performance associated with state-of-the-art FFT libraries. This provides significant performance improvement over approaches that pad to the next power of two even when using high-performance FFT libraries.

Metrics

16 Record Views
7 citations in Scopus

Details

Logo image