FlexFormer: Flexible Transformer for efficient visual recognition *

被引:7
|
作者
Fan, Xinyi [1 ]
Liu, Huajun [1 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing, Peoples R China
关键词
Vision transformer; Frequency analysis; Image classification;
D O I
10.1016/j.patrec.2023.03.028
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision Transformers have shown overwhelming superiority in computer vision communities compared with convolutional neural networks. Nevertheless, the understanding of multi-head self attentions, as the de facto ingredient of Transformers, is still limited, which leads to surging interest in explaining its core ideology. A notable theory interprets that, unlike high-frequency sensitive convolutions, self-attention be-haves like a generalized spatial smoothing and blurs the high spatial-frequency signals with depth in-creasing. In this paper, we design a Conv-MSA structure to extract efficient local contextual information and remedy the inherent drawback of self-attention. Accordingly, a flexible transformer structure named FlexFormer, with linear computational complexity on input image size, is proposed. Experimental results on several visual recognition benchmarks show that our FlexFormer achieved the state-of-the-art results on visual recognition tasks with fewer parameters and higher computational efficiency. (c) 2023 Elsevier B.V. All rights reserved.
引用
收藏
页码:95 / 101
页数:7
相关论文
共 50 条
  • [31] Flexible Visual Recognition by Evidential Modeling of Confusion and Ignorance
    Fan, Lei
    Liu, Bo
    Li, Haoxiang
    Wu, Ying
    Hua, Gang
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1338 - 1347
  • [32] Development of flexible visual recognition memory in human infants
    Robinson, AJ
    Pascalis, O
    DEVELOPMENTAL SCIENCE, 2004, 7 (05) : 527 - 533
  • [33] TSE DeepLab: An efficient visual transformer for medical image segmentation
    Yang, Jingdong
    Tu, Jun
    Zhang, Xiaolin
    Yu, Shaoqing
    Zheng, Xianyou
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 80
  • [34] Adaptively bypassing vision transformer blocks for efficient visual tracking
    Yang, Xiangyang
    Zeng, Dan
    Wang, Xucheng
    Wu, You
    Ye, Hengzhou
    Zhao, Qijun
    Li, Shuiwang
    PATTERN RECOGNITION, 2025, 161
  • [35] Efficient visual transformer transferring from neural ODE perspective
    Niu, Hao
    Luo, Fengming
    Yuan, Bo
    Zhang, Yi
    Wang, Jianyong
    ELECTRONICS LETTERS, 2024, 60 (17)
  • [36] Structured Pruning for Efficient Visual Place Recognition
    Grainge, Oliver
    Milford, Michael
    Bodala, Indu
    Ramchurn, Sarvapali D.
    Ehsan, Shoaib
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (02): : 2024 - 2031
  • [37] Target Focused Shallow Transformer Framework for Efficient Visual Tracking
    Rahman, Md Maklachur
    THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23409 - 23410
  • [38] Bilaterally Slimmable Transformer for Elastic and Efficient Visual Question Answering
    Yu, Zhou
    Jin, Zitian
    Yu, Jun
    Xu, Mingliang
    Wang, Hongbo
    Fan, Jianping
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9543 - 9556
  • [39] VST plus plus : Efficient and Stronger Visual Saliency Transformer
    Liu, Nian
    Luo, Ziyang
    Zhang, Ni
    Han, Junwei
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (11) : 7300 - 7316
  • [40] Efficient Mining of Optimal AND/OR Patterns for Visual Recognition
    Weng, Chaoqun
    Yuan, Junsong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (05) : 626 - 635