Adaptive search for broad attention based vision transformers

被引:0
|
作者
Li, Nannan [1 ,2 ,3 ]
Chen, Yaran [1 ,2 ]
Zhao, Dongbin [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence S, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
[3] Tsinghua Univ, Dept Automat, BNRIST, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Vision transformer; Adaptive architecture search; Broad search space; Image classification; Broad learning;
D O I
10.1016/j.neucom.2024.128696
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision Transformer (ViT) has prevailed among computer vision tasks for its powerful capability of image representation recently. Frustratingly, the manual design of efficient architectures for ViTs can be laborious, often involving repetitive trial and error processes. Furthermore, the exploration of lightweight ViTs remains limited, resulting in inferior performance compared to convolutional neural networks. To tackle these challenges, we propose Adaptive Search for Broad attention based Vision Transformers, called ASB, which automates the design of efficient ViT architectures by utilizing the broad search space and an adaptive evolutionary algorithm. The broad search space facilitates the exploration of a novel connection paradigm, enabling more comprehensive integration of attention information to improve ViT performance. Additionally, an efficient adaptive evolutionary algorithm is developed to efficiently explore architectures by dynamically learning the probability distribution of candidate operators. Our experimental results demonstrate that the adaptive evolution in ASB efficiently learns excellent lightweight models, achieving a 55% improvement in convergence speed over traditional evolutionary algorithms. Moreover, the effectiveness of ASB is validated across several visual tasks. For instance, on ImageNet classification, the searched model attains a performance of 77.8% with 6.5M parameters and outperforms state-of-the-art models, including EfficientNet and EfficientViT networks. On mobile COCO panoptic segmentation, our approach delivers 43.7% PQ. On mobile ADE20K semantic segmentation, our method attains 40.9% mIoU. The code and pre-trained models will be available soon in ASB-Code.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Rethinking Attention Mechanisms in Vision Transformers with Graph Structures
    Kim, Hyeongjin
    Ko, Byoung Chul
    SENSORS, 2024, 24 (04)
  • [22] Twins: Revisiting the Design of Spatial Attention in Vision Transformers
    Chu, Xiangxiang
    Tian, Zhi
    Wang, Yuqing
    Zhang, Bo
    Ren, Haibing
    Wei, Xiaolin
    Xia, Huaxia
    Shen, Chunhua
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [23] Multimodal Vision Transformers with Forced Attention for Behavior Analysis
    Agrawal, Tanay
    Balazia, Michal
    Muller, Philipp
    Bremond, Francois
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 3381 - 3391
  • [24] AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
    Meng, Lingchen
    Li, Hengduo
    Chen, Bor-Chun
    Lan, Shiyi
    Wu, Zuxuan
    Jiang, Yu-Gang
    Lim, Ser-Nam
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 12299 - 12308
  • [25] Accumulated Trivial Attention Matters in Vision Transformers on Small Datasets
    Chen, Xiangyu
    Hu, Qinghao
    Li, Kaidong
    Zhong, Cuncong
    Wang, Guanghui
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 3973 - 3981
  • [26] RAWAtten: Reconfigurable Accelerator for Window Attention in Hierarchical Vision Transformers
    Li, Wantong
    Luo, Yandong
    Yu, Shimeng
    2023 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2023,
  • [27] Focal Attention for Long-Range Interactions in Vision Transformers
    Yang, Jianwei
    Li, Chunyuan
    Zhang, Pengchuan
    Dai, Xiyang
    Xiao, Bin
    Yuan, Lu
    Gao, Jianfeng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [28] Leveraging vision transformers and entropy-based attention for accurate micro-expression recognition
    Yibo Zhang
    Weiguo Lin
    Yuanfa Zhang
    Junfeng Xu
    Yan Xu
    Scientific Reports, 15 (1)
  • [29] How Does Attention Work in Vision Transformers? A Visual Analytics Attempt
    Li, Yiran
    Wang, Junpeng
    Dai, Xin
    Wang, Liang
    Yeh, Chin-Chia Michael
    Zheng, Yan
    Zhang, Wei
    Ma, Kwan-Liu
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2023, 29 (06) : 2888 - 2900
  • [30] Enhanced astronomical source classification with integration of attention mechanisms and vision transformers
    Bhavanam, Srinadh Reddy
    Channappayya, Sumohana S.
    Srijith, P. K.
    Desai, Shantanu
    ASTROPHYSICS AND SPACE SCIENCE, 2024, 369 (08)