DMTrack: learning deformable masked visual representations for single object tracking

被引：0

作者：

Abdelaziz, Omar ^{[1
]}

Shehata, Mohamed ^{[1
]}

机构：

[1] Univ British Columbia, Dept Comp Sci Math Phys & Stat, 3333 Univ Way, Kelowna, BC V1V1V7, Canada

来源：

SIGNAL IMAGE AND VIDEO PROCESSING | 2025年 / 19卷 / 01期

关键词：

Single object tracking; Deformable convolutions; Vision transformers; One-stream trackers;

D O I：

10.1007/s11760-024-03713-0

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Single object tracking is still challenging because it requires localizing an arbitrary object in a sequence of frames, given only its appearance in the first frame of the sequence. Many trackers, especially those leveraging the Vision Transformer (ViT) backbone, have achieved superior performance. However, the gap between the performance metrics measured on the training data and those on the test data is still large. To alleviate this issue, we propose the deformable masking module in the transformer-based trackers. The deformable masking module is injected after each layer of the ViT. First, It masks out complete vectors of the output representations of the ViT layer. After that, the masked representations are fed into a deformable convolution to reconstruct new reliable representations. The output of the last layer of the ViT is modified by fusing it with all intermediate outputs of the deformable masking modules to produce a final robust attentional feature map. We extensively evaluate the performance of our model, named DMTrack, on seven different tracking benchmarks. Our model outperforms the previous state-of-the-art techniques by (+2%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$+\,2\%$$\end{document}) while having fewer parameters (-92.4%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-\,92.4\%$$\end{document}). Moreover, our model matches the performance of much larger models in terms of parameters, indicating our training strategy's effectiveness.

引用

页数：15

共 50 条

[1] Representation Learning for Visual Object Tracking by Masked Appearance Transfer
Zhao, Haojie
Wang, Dong
Lu, Huchuan
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18696 - 18705
[2] Learning Dynamic Compact Memory Embedding for Deformable Visual Object Tracking
Yu, Hongtao
Zhu, Pengfei
Zhang, Kaihua
Wang, Yu
Zhao, Shuai
Wang, Lei
Zhang, Tianzhu
Hu, Qinghua
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) : 5656 - 5670
[3] Visual Object Tracking with Autoencoder Representations
Besbinar, Beril
Alatan, A. Aydin
2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 2041 - 2044
[4] Accurate visual representation learning for single object tracking
Hua Bao
Ping Shu
Qijun Wang
Multimedia Tools and Applications, 2022, 81 : 24059 - 24079
[5] Accurate visual representation learning for single object tracking
Bao, Hua
Shu, Ping
Wang, Qijun
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (17) : 24059 - 24079
[6] Single online visual object tracking with enhanced tracking and detection learning
Yang Yi
Liping Luo
Zhenxian Zheng
Multimedia Tools and Applications, 2019, 78 : 12333 - 12351
[7] Single online visual object tracking with enhanced tracking and detection learning
Yi, Yang
Luo, Liping
Zheng, Zhenxian
MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (09) : 12333 - 12351
[8] Masked Autoencoders as Single Object Tracking Learners
Bo, Chunjuan
Chen, Xin
Zhang, Junxing
CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 80 (01): : 1105 - 1122
[9] Deformable Siamese Attention Networks for Visual Object Tracking
Yu, Yuechen
Xiong, Yilei
Huang, Weilin
Scott, Matthew R.
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 6727 - 6736
[10] ROBUST TRAJECTORY TRACKING WITH OPTIMAL VISUAL SERVOING ON A DEFORMABLE OBJECT
Derrar, Yasser
Saidi, Farah
Malti, Abed
International Journal of Robotics and Automation, 2023, 38 (03): : 180 - 193

← 1 2 3 4 5 →