Caterpillar: A Pure-MLP Architecture with Shifted-Pillars-Concatenation

Jin Sun; Xiaoshuang Shi; Zhiyuan Wang; Kaidi Xu; Heng Tao Shen; Xiaofeng Zhu

doi:10.1145/3664647.3680809

Back

Conference proceeding

Caterpillar: A Pure-MLP Architecture with Shifted-Pillars-Concatenation

Jin Sun, Xiaoshuang Shi, Zhiyuan Wang, Kaidi Xu, Heng Tao Shen and Xiaofeng Zhu

Proceedings of the 32nd ACM International Conference on Multimedia, pp 7123-7132

28 Oct 2024

DOI: https://doi.org/10.1145/3664647.3680809

Files and links (1)

url

https://doi.org/10.1145/3664647.3680809View

Published, Version of Record (VoR) Restricted

Abstract

Networks -- Network architectures -- Network design principles -- Layering

Modeling in Computer Vision has evolved to MLPs. Vision MLPs naturally lack local modeling capability, to which the simplest treatment is combined with convolutional layers. Convolution, famous for its sliding window scheme, also suffers from this scheme of redundancy and lower parallel computation. In this paper, we seek to dispense with the windowing scheme and introduce a more elaborate and parallelizable method to exploit locality. To this end, we propose a new MLP module, namely Shifted-Pillars-Concatenation (SPC), that consists of two steps of processes: (1) Pillars-Shift, which generates four neighboring maps by shifting the input image along four directions, and (2) Pillars-Concatenation, which applies linear transformations and concatenation on the maps to aggregate local features. SPC module offers superior local modeling power and performance gains, making it a promising alternative to the convolutional layer. Then, we build a pure-MLP architecture called Caterpillar by replacing the convolutional layer with the SPC module in a hybrid model of sMLPNet. Extensive experiments show Caterpillar's excellent performance on both small-scale and ImageNet-1k classification benchmarks, with remarkable scalability and transfer capability possessed as well. The code is available at https://github.com/sunjin19126/Caterpillar.

Metrics

17 Record Views

Details

Title: Caterpillar: A Pure-MLP Architecture with Shifted-Pillars-Concatenation
Creators: Jin Sun - University of Electronic Science and Technology of China
Xiaoshuang Shi - University of Electronic Science and Technology of China
Zhiyuan Wang - University of Electronic Science and Technology of China
Kaidi Xu - Drexel University
Heng Tao Shen - University of Electronic Science and Technology of China
Xiaofeng Zhu - University of Electronic Science and Technology of China
Publication Details: Proceedings of the 32nd ACM International Conference on Multimedia, pp 7123-7132
Conference: MM '24: The 32nd ACM International Conference on Multimedia
Series: ACM Conferences
Publisher: Association for Computing Machinery
Number of pages: 10
Resource Type: Conference proceeding
Language: English
Academic Unit: Computer Science
Scopus ID: 2-s2.0-85209778586
Other Identifier: 991021932115404721

Caterpillar: A Pure-MLP Architecture with Shifted-Pillars-Concatenation

Files and links (1)

Abstract

Metrics

Details

Drexel University Social media