Adaptive search for broad attention based vision transformers

被引:0
|
作者
Li, Nannan [1 ,2 ,3 ]
Chen, Yaran [1 ,2 ]
Zhao, Dongbin [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence S, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
[3] Tsinghua Univ, Dept Automat, BNRIST, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Vision transformer; Adaptive architecture search; Broad search space; Image classification; Broad learning;
D O I
10.1016/j.neucom.2024.128696
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision Transformer (ViT) has prevailed among computer vision tasks for its powerful capability of image representation recently. Frustratingly, the manual design of efficient architectures for ViTs can be laborious, often involving repetitive trial and error processes. Furthermore, the exploration of lightweight ViTs remains limited, resulting in inferior performance compared to convolutional neural networks. To tackle these challenges, we propose Adaptive Search for Broad attention based Vision Transformers, called ASB, which automates the design of efficient ViT architectures by utilizing the broad search space and an adaptive evolutionary algorithm. The broad search space facilitates the exploration of a novel connection paradigm, enabling more comprehensive integration of attention information to improve ViT performance. Additionally, an efficient adaptive evolutionary algorithm is developed to efficiently explore architectures by dynamically learning the probability distribution of candidate operators. Our experimental results demonstrate that the adaptive evolution in ASB efficiently learns excellent lightweight models, achieving a 55% improvement in convergence speed over traditional evolutionary algorithms. Moreover, the effectiveness of ASB is validated across several visual tasks. For instance, on ImageNet classification, the searched model attains a performance of 77.8% with 6.5M parameters and outperforms state-of-the-art models, including EfficientNet and EfficientViT networks. On mobile COCO panoptic segmentation, our approach delivers 43.7% PQ. On mobile ADE20K semantic segmentation, our method attains 40.9% mIoU. The code and pre-trained models will be available soon in ASB-Code.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Vision Transformers with Hierarchical Attention
    Liu, Yun
    Wu, Yu-Huan
    Sun, Guolei
    Zhang, Le
    Chhatkuli, Ajad
    Van Gool, Luc
    MACHINE INTELLIGENCE RESEARCH, 2024, 21 (04) : 670 - 683
  • [2] Constituent Attention for Vision Transformers
    Li, Haoling
    Xue, Mengqi
    Song, Jie
    Zhang, Haofei
    Huang, Wenqi
    Liang, Lingyu
    Song, Mingli
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 237
  • [3] AMixer: Adaptive Weight Mixing for Self-attention Free Vision Transformers
    Rao, Yongming
    Zhao, Wenliang
    Zhou, Jie
    Lu, Jiwen
    COMPUTER VISION, ECCV 2022, PT XXI, 2022, 13681 : 50 - 67
  • [4] BViT: Broad Attention-Based Vision Transformer
    Li, Nannan
    Chen, Yaran
    Li, Weifan
    Ding, Zixiang
    Zhao, Dongbin
    Nie, Shuai
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (09) : 12772 - 12783
  • [5] An Attention-Based Token Pruning Method for Vision Transformers
    Luo, Kaicheng
    Li, Huaxiong
    Zhou, Xianzhong
    Huang, Bing
    ROUGH SETS, IJCRS 2022, 2022, 13633 : 274 - 288
  • [6] Adaptive Attention Span in Transformers
    Sukhbaatar, Sainbayar
    Grave, Edouard
    Bojanowski, Piotr
    Joulin, Armand
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 331 - 335
  • [7] Robustifying Token Attention for Vision Transformers
    Guo, Yong
    Stutz, David
    Schiele, Bernt
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 17511 - 17522
  • [8] Efficient Vision Transformers with Partial Attention
    Vo, Xuan-Thuy
    Nguyen, Duy-Linh
    Priadana, Adri
    Jo, Kang-Hyun
    COMPUTER VISION - ECCV 2024, PT LXXXIII, 2025, 15141 : 298 - 317
  • [9] Fast Vision Transformers with HiLo Attention
    Pan, Zizheng
    Cai, Jianfei
    Zhuang, Bohan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [10] DaViT: Dual Attention Vision Transformers
    Ding, Mingyu
    Xiao, Bin
    Codella, Noel
    Luo, Ping
    Wang, Jingdong
    Yuan, Lu
    COMPUTER VISION, ECCV 2022, PT XXIV, 2022, 13684 : 74 - 92