Conference proceeding
Caterpillar: A Pure-MLP Architecture with Shifted-Pillars-Concatenation
Proceedings of the 32nd ACM International Conference on Multimedia, pp 7123-7132
28 Oct 2024
Abstract
Modeling in Computer Vision has evolved to MLPs. Vision MLPs naturally lack local modeling capability, to which the simplest treatment is combined with convolutional layers. Convolution, famous for its sliding window scheme, also suffers from this scheme of redundancy and lower parallel computation. In this paper, we seek to dispense with the windowing scheme and introduce a more elaborate and parallelizable method to exploit locality. To this end, we propose a new MLP module, namely Shifted-Pillars-Concatenation (SPC), that consists of two steps of processes: (1) Pillars-Shift, which generates four neighboring maps by shifting the input image along four directions, and (2) Pillars-Concatenation, which applies linear transformations and concatenation on the maps to aggregate local features. SPC module offers superior local modeling power and performance gains, making it a promising alternative to the convolutional layer. Then, we build a pure-MLP architecture called Caterpillar by replacing the convolutional layer with the SPC module in a hybrid model of sMLPNet. Extensive experiments show Caterpillar's excellent performance on both small-scale and ImageNet-1k classification benchmarks, with remarkable scalability and transfer capability possessed as well. The code is available at https://github.com/sunjin19126/Caterpillar.
Metrics
17 Record Views
Details
- Title
- Caterpillar: A Pure-MLP Architecture with Shifted-Pillars-Concatenation
- Creators
- Jin Sun - University of Electronic Science and Technology of ChinaXiaoshuang Shi - University of Electronic Science and Technology of ChinaZhiyuan Wang - University of Electronic Science and Technology of ChinaKaidi Xu - Drexel UniversityHeng Tao Shen - University of Electronic Science and Technology of ChinaXiaofeng Zhu - University of Electronic Science and Technology of China
- Publication Details
- Proceedings of the 32nd ACM International Conference on Multimedia, pp 7123-7132
- Conference
- MM '24: The 32nd ACM International Conference on Multimedia
- Series
- ACM Conferences
- Publisher
- Association for Computing Machinery
- Number of pages
- 10
- Resource Type
- Conference proceeding
- Language
- English
- Academic Unit
- Computer Science
- Scopus ID
- 2-s2.0-85209778586
- Other Identifier
- 991021932115404721