Adaptively bypassing vision transformer blocks for efficient visual tracking

被引:0
|
作者
Yang, Xiangyang [1 ]
Zeng, Dan [2 ]
Wang, Xucheng [1 ,4 ]
Wu, You [1 ]
Ye, Hengzhou [1 ]
Zhao, Qijun [3 ]
Li, Shuiwang [1 ]
机构
[1] Guilin Univ Technol, Coll Comp Sci & Engn, Guilin 541004, Peoples R China
[2] Sun Yat Sen Univ, Sch Artificial Intelligence, Zhuhai 510275, Peoples R China
[3] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[4] Fudan Univ, Sch Comp Sci, Shanghai 200082, Peoples R China
关键词
Efficient visual tracking; Adaptively bypassing; Pruning;
D O I
10.1016/j.patcog.2024.111278
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Empowered by transformer-based models, visual tracking has advanced significantly. However, the slow speed of current trackers limits their applicability on devices with constrained computational resources. To address this challenge, we introduce ABTrack, an adaptive computation framework that adaptively bypassing transformer blocks for efficient visual tracking. The rationale behind ABTrack is rooted in the observation that semantic features or relations do not uniformly impact the tracking task across all abstraction levels. Instead, this impact varies based on the characteristics of the target and the scene it occupies. Consequently, disregarding insignificant semantic features or relations at certain abstraction levels may not significantly affect the tracking accuracy. We propose a Bypass Decision Module (BDM) to determine if a transformer block should be bypassed, which adaptively simplifies the architecture of ViTs and thus speeds up the inference process. To counteract the time cost incurred by the BDMs and further enhance the efficiency of ViTs, we introduce a novel ViT pruning method to reduce the dimension of the latent representation of tokens in each transformer block. Extensive experiments on multiple tracking benchmarks validate the effectiveness and generality of the proposed method and show that it achieves state-of-the-art performance. Code is released at: https://github.com/xyyang317/ABTrack.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Efficient Vision Transformer via Token Merger
    Feng, Zhanzhou
    Zhang, Shiliang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 4156 - 4169
  • [32] LAND USE CLASSIFICATION EFFICIENT VISION TRANSFORMER
    Depoian, Arthur C., II
    Bailey, Colleen P.
    Guturu, Parthasarathy
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 2922 - 2925
  • [33] Recurrent Vision Transformer for Solving Visual Reasoning Problems
    Messina, Nicola
    Amato, Giuseppe
    Carrara, Fabio
    Gennaro, Claudio
    Falchi, Fabrizio
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT III, 2022, 13233 : 50 - 61
  • [34] Simulating Human Visual System Based on Vision Transformer
    Qiu, Mengyu
    Guo, Yi
    Zhang, Mingguang
    Zhang, Jingwei
    Lan, Tian
    Liu, Zhilin
    ACM SYMPOSIUM ON SPATIAL USER INTERACTION, SUI 2023, 2023,
  • [35] A Robust Visual Tracking Method Based on Reconstruction Patch Transformer Tracking
    Chen, Hui
    Wang, Zhenhai
    Tian, Hongyu
    Yuan, Lutao
    Wang, Xing
    Leng, Peng
    SENSORS, 2022, 22 (17)
  • [36] Visual Object Tracking in First Person Vision
    Matteo Dunnhofer
    Antonino Furnari
    Giovanni Maria Farinella
    Christian Micheloni
    International Journal of Computer Vision, 2023, 131 : 259 - 283
  • [37] Visual Object Tracking in First Person Vision
    Dunnhofer, Matteo
    Furnari, Antonino
    Farinella, Giovanni Maria
    Micheloni, Christian
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (01) : 259 - 283
  • [38] Visual Tracking based on deformable Transformer and spatiotemporal information
    Wu, Ruixu
    Wen, Xianbin
    Yuan, Liming
    Xu, Haixia
    Liu, Yanli
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 127
  • [39] Transformer Union Convolution Network for visual object tracking
    Song, Zhehan
    Chen, Yiming
    Luo, Peng
    Feng, Huajun
    Xu, Zhihai
    Li, Qi
    OPTICS COMMUNICATIONS, 2022, 524
  • [40] Visual tracking based on spatiotemporal transformer and fusion sequences
    Wu, Ruixu
    Liu, Yanli
    Wang, Xiaogang
    Yang, Peilin
    IMAGE AND VISION COMPUTING, 2024, 148