Conference proceeding
High performance implementation of the inverse TFT
Proceedings of the 2015 International Workshop on parallel symbolic computation, v 10, pp 87-94
10 Jul 2015
Abstract
The inverse truncated Fourier transform (ITFT) is a key component in the fast polynomial and large integer algorithms introduced by van der Hoeven. This paper reports a high performance implementation of the ITFT which poses additional challenges compared to that of the forward transform. A general-radix variant of the ITFT algorithm is developed to allow the implementation to automatically adapt to the memory hierarchy. Then a parallel ITFT algorithm is developed that trades off small arithmetic cost for full vectorization and improved multi-threaded parallelism. The algorithms are automatically generated and tuned to produce an arbitrary-size ITFT library. The new algorithms and the implementation smooths out the staircase performance associated with power-of-two modular FFT implementations, and provide significant performance improvement over zero-padding approaches even when high-performance FFT libraries are used.
Metrics
8 Record Views
2 citations in Scopus
Details
- Title
- High performance implementation of the inverse TFT
- Creators
- Lingchuan Meng - Drexel UniversityJeremy Johnson - Drexel UniversityJamila Danielle Johnson - Bennett S. LeBow College of Business (1999-)
- Publication Details
- Proceedings of the 2015 International Workshop on parallel symbolic computation, v 10, pp 87-94
- Conference
- 2015 International Workshop on parallel symbolic computation
- Series
- PASCO '15
- Publisher
- Association for Computing Machinery (ACM)
- Number of pages
- 1
- Resource Type
- Conference proceeding
- Language
- English
- Academic Unit
- Bennett S. LeBow College of Business; Computer Science
- Scopus ID
- 2-s2.0-84958055512
- Other Identifier
- 991019173733004721