Caterpillar: A Pure-MLP Architecture with Shifted-Pillars-Concatenation

Jin Sun; Xiaoshuang Shi; Zhiyuan Wang; Kaidi Xu; Heng Tao Shen; Xiaofeng Zhu

doi:10.48550/arxiv.2305.17644

Back

Caterpillar: A Pure-MLP Architecture with Shifted-Pillars-Concatenation

Preprint

Open access

Caterpillar: A Pure-MLP Architecture with Shifted-Pillars-Concatenation

Jin Sun, Xiaoshuang Shi, Zhiyuan Wang, Kaidi Xu, Heng Tao Shen and Xiaofeng Zhu

arXiv.org

30 Nov 2023

DOI: https://doi.org/10.48550/arxiv.2305.17644

Files and links (1)

url

https://doi.org/10.48550/arxiv.2305.17644View

Preprint (Author's original)arXiv.org - Non-exclusive license to distribute, Open

Abstract

Computer Science - Computer Vision and Pattern Recognition

Modeling in Computer Vision has evolved to MLPs. Vision MLPs naturally lack local modeling capability, to which the simplest treatment is combined with convolutional layers. Convolution, famous for its sliding window scheme, also suffers from this scheme of redundancy and low computational efficiency. In this paper, we seek to dispense with the windowing scheme and introduce a more elaborate and effective approach to exploiting locality. To this end, we propose a new MLP module, namely Shifted-Pillars-Concatenation (SPC), that consists of two steps of processes: (1) Pillars-Shift, which generates four neighboring maps by shifting the input image along four directions, and (2) Pillars-Concatenation, which applies linear transformations and concatenation on the maps to aggregate local features. SPC module offers superior local modeling power and performance gains, making it a promising alternative to the convolutional layer. Then, we build a pure-MLP architecture called Caterpillar by replacing the convolutional layer with the SPC module in a hybrid model of sMLPNet. Extensive experiments show Caterpillar's excellent performance and scalability on both ImageNet-1K and small-scale classification benchmarks.

Metrics

8 Record Views

Details

Title: Caterpillar: A Pure-MLP Architecture with Shifted-Pillars-Concatenation
Creators: Jin Sun
Xiaoshuang Shi
Zhiyuan Wang
Kaidi Xu
Heng Tao Shen
Xiaofeng Zhu
Publication Details: arXiv.org
Resource Type: Preprint
Language: English
Academic Unit: Computer Science
Other Identifier: 991021871335504721

Caterpillar: A Pure-MLP Architecture with Shifted-Pillars-Concatenation

Files and links (1)

Abstract

Metrics

Details

Drexel University Social media