FlexFormer: Flexible Transformer for efficient visual recognition *

被引：7

作者：

Fan, Xinyi ^{[1
]}

Liu, Huajun ^{[1
]}

机构：

[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing, Peoples R China

来源：

PATTERN RECOGNITION LETTERS | 2023年 / 169卷

关键词：

Vision transformer; Frequency analysis; Image classification;

D O I：

10.1016/j.patrec.2023.03.028

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Vision Transformers have shown overwhelming superiority in computer vision communities compared with convolutional neural networks. Nevertheless, the understanding of multi-head self attentions, as the de facto ingredient of Transformers, is still limited, which leads to surging interest in explaining its core ideology. A notable theory interprets that, unlike high-frequency sensitive convolutions, self-attention be-haves like a generalized spatial smoothing and blurs the high spatial-frequency signals with depth in-creasing. In this paper, we design a Conv-MSA structure to extract efficient local contextual information and remedy the inherent drawback of self-attention. Accordingly, a flexible transformer structure named FlexFormer, with linear computational complexity on input image size, is proposed. Experimental results on several visual recognition benchmarks show that our FlexFormer achieved the state-of-the-art results on visual recognition tasks with fewer parameters and higher computational efficiency. (c) 2023 Elsevier B.V. All rights reserved.

引用

页码：95 / 101

页数：7

共 50 条

[31] Flexible Visual Recognition by Evidential Modeling of Confusion and Ignorance
Fan, Lei
Liu, Bo
Li, Haoxiang
Wu, Ying
Hua, Gang
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1338 - 1347
[32] Development of flexible visual recognition memory in human infants
Robinson, AJ
Pascalis, O
DEVELOPMENTAL SCIENCE, 2004, 7 (05) : 527 - 533
[33] TSE DeepLab: An efficient visual transformer for medical image segmentation
Yang, Jingdong
Tu, Jun
Zhang, Xiaolin
Yu, Shaoqing
Zheng, Xianyou
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 80
[34] Adaptively bypassing vision transformer blocks for efficient visual tracking
Yang, Xiangyang
Zeng, Dan
Wang, Xucheng
Wu, You
Ye, Hengzhou
Zhao, Qijun
Li, Shuiwang
PATTERN RECOGNITION, 2025, 161
[35] Efficient visual transformer transferring from neural ODE perspective
Niu, Hao
Luo, Fengming
Yuan, Bo
Zhang, Yi
Wang, Jianyong
ELECTRONICS LETTERS, 2024, 60 (17)
[36] Structured Pruning for Efficient Visual Place Recognition
Grainge, Oliver
Milford, Michael
Bodala, Indu
Ramchurn, Sarvapali D.
Ehsan, Shoaib
IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (02): : 2024 - 2031
[37] Target Focused Shallow Transformer Framework for Efficient Visual Tracking
Rahman, Md Maklachur
THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23409 - 23410
[38] Bilaterally Slimmable Transformer for Elastic and Efficient Visual Question Answering
Yu, Zhou
Jin, Zitian
Yu, Jun
Xu, Mingliang
Wang, Hongbo
Fan, Jianping
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9543 - 9556
[39] VST plus plus : Efficient and Stronger Visual Saliency Transformer
Liu, Nian
Luo, Ziyang
Zhang, Ni
Han, Junwei
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (11) : 7300 - 7316
[40] Efficient Mining of Optimal AND/OR Patterns for Visual Recognition
Weng, Chaoqun
Yuan, Junsong
IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (05) : 626 - 635

← 1 2 3 4 5 →