Adaptively bypassing vision transformer blocks for efficient visual tracking

被引：0

作者：

Yang, Xiangyang ^{[1
]}

Zeng, Dan ^{[2
]}

Wang, Xucheng ^{[1
,4
]}

Wu, You ^{[1
]}

Ye, Hengzhou ^{[1
]}

Zhao, Qijun ^{[3
]}

Li, Shuiwang ^{[1
]}

机构：

[1] Guilin Univ Technol, Coll Comp Sci & Engn, Guilin 541004, Peoples R China

[2] Sun Yat Sen Univ, Sch Artificial Intelligence, Zhuhai 510275, Peoples R China

[3] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China

[4] Fudan Univ, Sch Comp Sci, Shanghai 200082, Peoples R China

来源：

PATTERN RECOGNITION | 2025年 / 161卷

关键词：

Efficient visual tracking; Adaptively bypassing; Pruning;

D O I：

10.1016/j.patcog.2024.111278

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Empowered by transformer-based models, visual tracking has advanced significantly. However, the slow speed of current trackers limits their applicability on devices with constrained computational resources. To address this challenge, we introduce ABTrack, an adaptive computation framework that adaptively bypassing transformer blocks for efficient visual tracking. The rationale behind ABTrack is rooted in the observation that semantic features or relations do not uniformly impact the tracking task across all abstraction levels. Instead, this impact varies based on the characteristics of the target and the scene it occupies. Consequently, disregarding insignificant semantic features or relations at certain abstraction levels may not significantly affect the tracking accuracy. We propose a Bypass Decision Module (BDM) to determine if a transformer block should be bypassed, which adaptively simplifies the architecture of ViTs and thus speeds up the inference process. To counteract the time cost incurred by the BDMs and further enhance the efficiency of ViTs, we introduce a novel ViT pruning method to reduce the dimension of the latent representation of tokens in each transformer block. Extensive experiments on multiple tracking benchmarks validate the effectiveness and generality of the proposed method and show that it achieves state-of-the-art performance. Code is released at: https://github.com/xyyang317/ABTrack.

引用

页数：11

共 50 条

[1] VTST: Efficient Visual Tracking With a Stereoscopic Transformer
Gu, Fengwei
Lu, Jun
Cai, Chengtao
Zhu, Qidan
Ju, Zhaojie
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (03): : 2401 - 2416
[2] Hierarchical Vision and Language Transformer for Efficient Visual Dialog
He, Qiangqiang
Zhang, Mujie
Zhang, Jie
Yang, Shang
Wang, Chongjun
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VI, 2023, 14259 : 421 - 432
[3] ASAFormer: Visual tracking with convolutional vision transformer and asymmetric selective attention
Gong, Xiaomei
Zhang, Yi
Hu, Shu
KNOWLEDGE-BASED SYSTEMS, 2024, 291
[4] ASAFormer: Visual tracking with convolutional vision transformer and asymmetric selective attention
Gong, Xiaomei
Zhang, Yi
Hu, Shu
Knowledge-Based Systems, 2024, 291
[5] Target Focused Shallow Transformer Framework for Efficient Visual Tracking
Rahman, Md Maklachur
THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23409 - 23410
[6] MIMTracking: Masked image modeling enhanced vision transformer for visual object tracking
Zhang, Shuo
Zhang, Dan
Zou, Qi
NEUROCOMPUTING, 2024, 606
[7] Exploring Lightweight Hierarchical Vision Transformers for Efficient Visual Tracking
Kang, Ben
Chen, Xin
Wang, Dong
Peng, Houwen
Lu, Huchuan
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 9578 - 9587
[8] DPT-tracker: Dual pooling transformer for efficient visual tracking
Fang, Yang
Xie, Bailian
Khairuddin, Uswah
Min, Zijian
Jiang, Bingbing
Li, Weisheng
CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2024, 9 (04) : 948 - 959
[9] Exploiting temporal coherence for self-supervised visual tracking by using vision transformer
Zhu, Wenjun
Wang, Zuyi
Xu, Li
Meng, Jun
KNOWLEDGE-BASED SYSTEMS, 2022, 251
[10] CVTrack: Combined Convolutional Neural Network and Vision Transformer Fusion Model for Visual Tracking
Wang, Jian
Song, Yueming
Song, Ce
Tian, Haonan
Zhang, Shuai
Sun, Jinghui
SENSORS, 2024, 24 (01)

← 1 2 3 4 5 →