DMTrack: learning deformable masked visual representations for single object tracking

被引：0

作者：

Abdelaziz, Omar ^{[1
]}

Shehata, Mohamed ^{[1
]}

机构：

[1] Univ British Columbia, Dept Comp Sci Math Phys & Stat, 3333 Univ Way, Kelowna, BC V1V1V7, Canada

来源：

SIGNAL IMAGE AND VIDEO PROCESSING | 2025年 / 19卷 / 01期

关键词：

Single object tracking; Deformable convolutions; Vision transformers; One-stream trackers;

D O I：

10.1007/s11760-024-03713-0

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Single object tracking is still challenging because it requires localizing an arbitrary object in a sequence of frames, given only its appearance in the first frame of the sequence. Many trackers, especially those leveraging the Vision Transformer (ViT) backbone, have achieved superior performance. However, the gap between the performance metrics measured on the training data and those on the test data is still large. To alleviate this issue, we propose the deformable masking module in the transformer-based trackers. The deformable masking module is injected after each layer of the ViT. First, It masks out complete vectors of the output representations of the ViT layer. After that, the masked representations are fed into a deformable convolution to reconstruct new reliable representations. The output of the last layer of the ViT is modified by fusing it with all intermediate outputs of the deformable masking modules to produce a final robust attentional feature map. We extensively evaluate the performance of our model, named DMTrack, on seven different tracking benchmarks. Our model outperforms the previous state-of-the-art techniques by (+2%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$+\,2\%$$\end{document}) while having fewer parameters (-92.4%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-\,92.4\%$$\end{document}). Moreover, our model matches the performance of much larger models in terms of parameters, indicating our training strategy's effectiveness.

引用

页数：15

共 50 条

[21] Variable scale learning for visual object tracking
Xuedong He
Lu Zhao
Calvin Yu-Chian Chen
Journal of Ambient Intelligence and Humanized Computing, 2023, 14 : 3315 - 3330
[22] Learning to Adversarially Blur Visual Object Tracking
Guo, Qing
Cheng, Ziyi
Juefei-Xu, Felix
Ma, Lei
Xie, Xiaofei
Liu, Yang
Zhao, Jianjun
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 10819 - 10828
[23] Progressive Unsupervised Learning for Visual Object Tracking
Wu, Qiangqiang
Wan, Jia
Chan, Antoni B.
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 2992 - 3001
[24] Deep mutual learning for visual object tracking
Zhao, Haojie
Yang, Gang
Wang, Dong
Lu, Huchuan
PATTERN RECOGNITION, 2021, 112 (112)
[25] Visual Learning in Multiple-Object Tracking
Makovski, Tal
Vazquez, Gustavo A.
Jiang, Yuhong V.
PLOS ONE, 2008, 3 (05):
[26] Variable scale learning for visual object tracking
He, Xuedong
Zhao, Lu
Chen, Calvin Yu-Chian
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 14 (4) : 3315 - 3330
[27] MIMTracking: Masked image modeling enhanced vision transformer for visual object tracking
Zhang, Shuo
Zhang, Dan
Zou, Qi
NEUROCOMPUTING, 2024, 606
[28] SiamADT: Siamese Attention and Deformable Features Fusion Network for Visual Object Tracking
Wang, Fasheng
Cao, Ping
Wang, Xing
He, Bing
Sun, Fuming
NEURAL PROCESSING LETTERS, 2023, 55 (06) : 7933 - 7950
[29] SiamADT: Siamese Attention and Deformable Features Fusion Network for Visual Object Tracking
Fasheng Wang
Ping Cao
Xing Wang
Bing He
Fuming Sun
Neural Processing Letters, 2023, 55 : 7933 - 7950
[30] Learning Attribute-Specific Representations for Visual Tracking
Qi, Yuankai
Zhang, Shengping
Zhang, Weigang
Su, Li
Huang, Qingming
Yang, Ming-Hsuan
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8835 - 8842

← 1 2 3 4 5 →