Adaptively bypassing vision transformer blocks for efficient visual tracking

被引:0
|
作者
Yang, Xiangyang [1 ]
Zeng, Dan [2 ]
Wang, Xucheng [1 ,4 ]
Wu, You [1 ]
Ye, Hengzhou [1 ]
Zhao, Qijun [3 ]
Li, Shuiwang [1 ]
机构
[1] Guilin Univ Technol, Coll Comp Sci & Engn, Guilin 541004, Peoples R China
[2] Sun Yat Sen Univ, Sch Artificial Intelligence, Zhuhai 510275, Peoples R China
[3] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[4] Fudan Univ, Sch Comp Sci, Shanghai 200082, Peoples R China
关键词
Efficient visual tracking; Adaptively bypassing; Pruning;
D O I
10.1016/j.patcog.2024.111278
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Empowered by transformer-based models, visual tracking has advanced significantly. However, the slow speed of current trackers limits their applicability on devices with constrained computational resources. To address this challenge, we introduce ABTrack, an adaptive computation framework that adaptively bypassing transformer blocks for efficient visual tracking. The rationale behind ABTrack is rooted in the observation that semantic features or relations do not uniformly impact the tracking task across all abstraction levels. Instead, this impact varies based on the characteristics of the target and the scene it occupies. Consequently, disregarding insignificant semantic features or relations at certain abstraction levels may not significantly affect the tracking accuracy. We propose a Bypass Decision Module (BDM) to determine if a transformer block should be bypassed, which adaptively simplifies the architecture of ViTs and thus speeds up the inference process. To counteract the time cost incurred by the BDMs and further enhance the efficiency of ViTs, we introduce a novel ViT pruning method to reduce the dimension of the latent representation of tokens in each transformer block. Extensive experiments on multiple tracking benchmarks validate the effectiveness and generality of the proposed method and show that it achieves state-of-the-art performance. Code is released at: https://github.com/xyyang317/ABTrack.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] VTST: Efficient Visual Tracking With a Stereoscopic Transformer
    Gu, Fengwei
    Lu, Jun
    Cai, Chengtao
    Zhu, Qidan
    Ju, Zhaojie
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (03): : 2401 - 2416
  • [2] Hierarchical Vision and Language Transformer for Efficient Visual Dialog
    He, Qiangqiang
    Zhang, Mujie
    Zhang, Jie
    Yang, Shang
    Wang, Chongjun
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VI, 2023, 14259 : 421 - 432
  • [3] ASAFormer: Visual tracking with convolutional vision transformer and asymmetric selective attention
    Gong, Xiaomei
    Zhang, Yi
    Hu, Shu
    KNOWLEDGE-BASED SYSTEMS, 2024, 291
  • [4] ASAFormer: Visual tracking with convolutional vision transformer and asymmetric selective attention
    Gong, Xiaomei
    Zhang, Yi
    Hu, Shu
    Knowledge-Based Systems, 2024, 291
  • [5] Target Focused Shallow Transformer Framework for Efficient Visual Tracking
    Rahman, Md Maklachur
    THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23409 - 23410
  • [6] MIMTracking: Masked image modeling enhanced vision transformer for visual object tracking
    Zhang, Shuo
    Zhang, Dan
    Zou, Qi
    NEUROCOMPUTING, 2024, 606
  • [7] Exploring Lightweight Hierarchical Vision Transformers for Efficient Visual Tracking
    Kang, Ben
    Chen, Xin
    Wang, Dong
    Peng, Houwen
    Lu, Huchuan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 9578 - 9587
  • [8] DPT-tracker: Dual pooling transformer for efficient visual tracking
    Fang, Yang
    Xie, Bailian
    Khairuddin, Uswah
    Min, Zijian
    Jiang, Bingbing
    Li, Weisheng
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2024, 9 (04) : 948 - 959
  • [9] Exploiting temporal coherence for self-supervised visual tracking by using vision transformer
    Zhu, Wenjun
    Wang, Zuyi
    Xu, Li
    Meng, Jun
    KNOWLEDGE-BASED SYSTEMS, 2022, 251
  • [10] CVTrack: Combined Convolutional Neural Network and Vision Transformer Fusion Model for Visual Tracking
    Wang, Jian
    Song, Yueming
    Song, Ce
    Tian, Haonan
    Zhang, Shuai
    Sun, Jinghui
    SENSORS, 2024, 24 (01)