Adaptively bypassing vision transformer blocks for efficient visual tracking

被引：0

作者：

Yang, Xiangyang ^{[1
]}

Zeng, Dan ^{[2
]}

Wang, Xucheng ^{[1
,4
]}

Wu, You ^{[1
]}

Ye, Hengzhou ^{[1
]}

Zhao, Qijun ^{[3
]}

Li, Shuiwang ^{[1
]}

机构：

[1] Guilin Univ Technol, Coll Comp Sci & Engn, Guilin 541004, Peoples R China

[2] Sun Yat Sen Univ, Sch Artificial Intelligence, Zhuhai 510275, Peoples R China

[3] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China

[4] Fudan Univ, Sch Comp Sci, Shanghai 200082, Peoples R China

来源：

PATTERN RECOGNITION | 2025年 / 161卷

关键词：

Efficient visual tracking; Adaptively bypassing; Pruning;

D O I：

10.1016/j.patcog.2024.111278

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Empowered by transformer-based models, visual tracking has advanced significantly. However, the slow speed of current trackers limits their applicability on devices with constrained computational resources. To address this challenge, we introduce ABTrack, an adaptive computation framework that adaptively bypassing transformer blocks for efficient visual tracking. The rationale behind ABTrack is rooted in the observation that semantic features or relations do not uniformly impact the tracking task across all abstraction levels. Instead, this impact varies based on the characteristics of the target and the scene it occupies. Consequently, disregarding insignificant semantic features or relations at certain abstraction levels may not significantly affect the tracking accuracy. We propose a Bypass Decision Module (BDM) to determine if a transformer block should be bypassed, which adaptively simplifies the architecture of ViTs and thus speeds up the inference process. To counteract the time cost incurred by the BDMs and further enhance the efficiency of ViTs, we introduce a novel ViT pruning method to reduce the dimension of the latent representation of tokens in each transformer block. Extensive experiments on multiple tracking benchmarks validate the effectiveness and generality of the proposed method and show that it achieves state-of-the-art performance. Code is released at: https://github.com/xyyang317/ABTrack.

引用

页数：11

共 50 条

[41] Hunt-inspired Transformer for visual object tracking
Zhang, Zhibin
Xue, Wanli
Zhou, Yuxi
Zhang, Kaihua
Chen, Shengyong
PATTERN RECOGNITION, 2024, 156
[42] Learning Spatio-Temporal Transformer for Visual Tracking
Yan, Bin
Peng, Houwen
Fu, Jianlong
Wang, Dong
Lu, Huchuan
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 10428 - 10437
[43] Visual tracking using transformer with a combination of convolution and attention
Wang, Yuxuan
Yan, Liping
Feng, Zihang
Xia, Yuanqing
Xiao, Bo
IMAGE AND VISION COMPUTING, 2023, 137
[44] Bidirectional Interaction of CNN and Transformer Feature for Visual Tracking
Sun, Baozhen
Wang, Zhenhua
Wang, Shilei
Cheng, Yongkang
Ning, Jifeng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 7259 - 7271
[45] Pedestrian Head Detection and Tracking via Global Vision Transformer
Xuan-Thuy Vo
Van-Dung Hoang
Duy-Linh Nguyen
Kang-Hyun Jo
FRONTIERS OF COMPUTER VISION (IW-FCV 2022), 2022, 1578 : 155 - 167
[46] Experiments And Discussions On Vision Transformer (ViT) Parameters For Object Tracking
Fukushima, Daiki
Ishikawa, Tomokazu
2022 NICOGRAPH INTERNATIONAL, NICOINT 2022, 2022, : 64 - 68
[47] FlexFormer: Flexible Transformer for efficient visual recognition *
Fan, Xinyi
Liu, Huajun
PATTERN RECOGNITION LETTERS, 2023, 169 : 95 - 101
[48] Multi-tailed vision transformer for efficient inference
Wang, Yunke
Du, Bo
Wang, Wenyuan
Xu, Chang
NEURAL NETWORKS, 2024, 174
[49] A-ViT: Adaptive Tokens for Efficient Vision Transformer
Yin, Hongxu
Vahdat, Arash
Alvarez, Jose M.
Mallya, Arun
Kautz, Jan
Molchanov, Pavlo
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10799 - 10808
[50] Robust Visual Tracking Using Hierarchical Vision Transformer with Shifted Windows Multi-Head Self-Attention
Gao, Peng
Zhang, Xin-Yue
Yang, Xiao-Li
Ni, Jian-Cheng
Wang, Fei
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (01) : 161 - 164

← 1 2 3 4 5 →