Adaptive search for broad attention based vision transformers

被引:0
|
作者
Li, Nannan [1 ,2 ,3 ]
Chen, Yaran [1 ,2 ]
Zhao, Dongbin [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence S, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
[3] Tsinghua Univ, Dept Automat, BNRIST, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Vision transformer; Adaptive architecture search; Broad search space; Image classification; Broad learning;
D O I
10.1016/j.neucom.2024.128696
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision Transformer (ViT) has prevailed among computer vision tasks for its powerful capability of image representation recently. Frustratingly, the manual design of efficient architectures for ViTs can be laborious, often involving repetitive trial and error processes. Furthermore, the exploration of lightweight ViTs remains limited, resulting in inferior performance compared to convolutional neural networks. To tackle these challenges, we propose Adaptive Search for Broad attention based Vision Transformers, called ASB, which automates the design of efficient ViT architectures by utilizing the broad search space and an adaptive evolutionary algorithm. The broad search space facilitates the exploration of a novel connection paradigm, enabling more comprehensive integration of attention information to improve ViT performance. Additionally, an efficient adaptive evolutionary algorithm is developed to efficiently explore architectures by dynamically learning the probability distribution of candidate operators. Our experimental results demonstrate that the adaptive evolution in ASB efficiently learns excellent lightweight models, achieving a 55% improvement in convergence speed over traditional evolutionary algorithms. Moreover, the effectiveness of ASB is validated across several visual tasks. For instance, on ImageNet classification, the searched model attains a performance of 77.8% with 6.5M parameters and outperforms state-of-the-art models, including EfficientNet and EfficientViT networks. On mobile COCO panoptic segmentation, our approach delivers 43.7% PQ. On mobile ADE20K semantic segmentation, our method attains 40.9% mIoU. The code and pre-trained models will be available soon in ASB-Code.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers
    Wei, Cong
    Duke, Brendan
    Jiang, Ruowei
    Aarabi, Parham
    Taylor, Graham W.
    Shkurti, Florian
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22680 - 22689
  • [42] TAQ: TOP-K ATTENTION-AWARE QUANTIZATION FOR VISION TRANSFORMERS
    Shi, Lili
    Huang, Haiduo
    Song, Bowei
    Tan, Meng
    Zhao, Wenzhe
    Xia, Tian
    Ren, Pengju
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1750 - 1754
  • [43] Vision Transformers with Cross-Attention Pyramids for Class-Agnostic Counting
    Jiban, Md Jibanul Haque
    Mahalanobis, Abhijit
    Lobo, Niels Da Vitoria
    2024 9TH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING, ICSIP, 2024, : 689 - 695
  • [44] ADA-VIT: ATTENTION-GUIDED DATA AUGMENTATION FOR VISION TRANSFORMERS
    Baili, Nada
    Frigui, Hichem
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 385 - 389
  • [45] ViTDroid: Vision Transformers for Efficient, Explainable Attention to Malicious Behavior in Android Binaries
    Syed, Toqeer Ali
    Nauman, Mohammad
    Khan, Sohail
    Jan, Salman
    Zuhairi, Megat F.
    SENSORS, 2024, 24 (20)
  • [46] Make a Long Image Short: Adaptive Token Length for Vision Transformers
    Zhou, Qiqi
    Zhu, Yichen
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT II, 2023, 14170 : 69 - 85
  • [47] Domain-Adaptive Vision Transformers for Generalizing Across Visual Domains
    Cho, Yunsung
    Yun, Jungmin
    Kwon, Junehyoung
    Kim, Youngbin
    IEEE ACCESS, 2023, 11 : 115644 - 115653
  • [48] Adaptive patch selection to improve Vision Transformers through Reinforcement Learning
    Cauteruccio, Francesco
    Marchetti, Michele
    Traini, Davide
    Ursino, Domenico
    Virgili, Luca
    APPLIED INTELLIGENCE, 2025, 55 (07)
  • [49] HeatViT: Hardware-Efficient Adaptive Token Pruning for Vision Transformers
    Dong, Peiyan
    Sun, Mengshu
    Lu, Alec
    Xie, Yanyue
    Liu, Kenneth
    Kong, Zhenglun
    Meng, Xin
    Li, Zhengang
    Lin, Xue
    Fang, Zhenman
    Wang, Yanzhi
    2023 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA, 2023, : 442 - 455
  • [50] B2-ViT Net: Broad Vision Transformer Network With Broad Attention for Seizure Prediction
    Shi, Shuiling
    Liu, Wenqi
    IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2024, 32 : 178 - 188