Adaptively bypassing vision transformer blocks for efficient visual tracking

被引:0
|
作者
Yang, Xiangyang [1 ]
Zeng, Dan [2 ]
Wang, Xucheng [1 ,4 ]
Wu, You [1 ]
Ye, Hengzhou [1 ]
Zhao, Qijun [3 ]
Li, Shuiwang [1 ]
机构
[1] Guilin Univ Technol, Coll Comp Sci & Engn, Guilin 541004, Peoples R China
[2] Sun Yat Sen Univ, Sch Artificial Intelligence, Zhuhai 510275, Peoples R China
[3] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[4] Fudan Univ, Sch Comp Sci, Shanghai 200082, Peoples R China
关键词
Efficient visual tracking; Adaptively bypassing; Pruning;
D O I
10.1016/j.patcog.2024.111278
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Empowered by transformer-based models, visual tracking has advanced significantly. However, the slow speed of current trackers limits their applicability on devices with constrained computational resources. To address this challenge, we introduce ABTrack, an adaptive computation framework that adaptively bypassing transformer blocks for efficient visual tracking. The rationale behind ABTrack is rooted in the observation that semantic features or relations do not uniformly impact the tracking task across all abstraction levels. Instead, this impact varies based on the characteristics of the target and the scene it occupies. Consequently, disregarding insignificant semantic features or relations at certain abstraction levels may not significantly affect the tracking accuracy. We propose a Bypass Decision Module (BDM) to determine if a transformer block should be bypassed, which adaptively simplifies the architecture of ViTs and thus speeds up the inference process. To counteract the time cost incurred by the BDMs and further enhance the efficiency of ViTs, we introduce a novel ViT pruning method to reduce the dimension of the latent representation of tokens in each transformer block. Extensive experiments on multiple tracking benchmarks validate the effectiveness and generality of the proposed method and show that it achieves state-of-the-art performance. Code is released at: https://github.com/xyyang317/ABTrack.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Hunt-inspired Transformer for visual object tracking
    Zhang, Zhibin
    Xue, Wanli
    Zhou, Yuxi
    Zhang, Kaihua
    Chen, Shengyong
    PATTERN RECOGNITION, 2024, 156
  • [42] Learning Spatio-Temporal Transformer for Visual Tracking
    Yan, Bin
    Peng, Houwen
    Fu, Jianlong
    Wang, Dong
    Lu, Huchuan
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 10428 - 10437
  • [43] Visual tracking using transformer with a combination of convolution and attention
    Wang, Yuxuan
    Yan, Liping
    Feng, Zihang
    Xia, Yuanqing
    Xiao, Bo
    IMAGE AND VISION COMPUTING, 2023, 137
  • [44] Bidirectional Interaction of CNN and Transformer Feature for Visual Tracking
    Sun, Baozhen
    Wang, Zhenhua
    Wang, Shilei
    Cheng, Yongkang
    Ning, Jifeng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 7259 - 7271
  • [45] Pedestrian Head Detection and Tracking via Global Vision Transformer
    Xuan-Thuy Vo
    Van-Dung Hoang
    Duy-Linh Nguyen
    Kang-Hyun Jo
    FRONTIERS OF COMPUTER VISION (IW-FCV 2022), 2022, 1578 : 155 - 167
  • [46] Experiments And Discussions On Vision Transformer (ViT) Parameters For Object Tracking
    Fukushima, Daiki
    Ishikawa, Tomokazu
    2022 NICOGRAPH INTERNATIONAL, NICOINT 2022, 2022, : 64 - 68
  • [47] FlexFormer: Flexible Transformer for efficient visual recognition *
    Fan, Xinyi
    Liu, Huajun
    PATTERN RECOGNITION LETTERS, 2023, 169 : 95 - 101
  • [48] Multi-tailed vision transformer for efficient inference
    Wang, Yunke
    Du, Bo
    Wang, Wenyuan
    Xu, Chang
    NEURAL NETWORKS, 2024, 174
  • [49] A-ViT: Adaptive Tokens for Efficient Vision Transformer
    Yin, Hongxu
    Vahdat, Arash
    Alvarez, Jose M.
    Mallya, Arun
    Kautz, Jan
    Molchanov, Pavlo
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10799 - 10808
  • [50] Robust Visual Tracking Using Hierarchical Vision Transformer with Shifted Windows Multi-Head Self-Attention
    Gao, Peng
    Zhang, Xin-Yue
    Yang, Xiao-Li
    Ni, Jian-Cheng
    Wang, Fei
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (01) : 161 - 164