DMTrack: learning deformable masked visual representations for single object tracking

被引:0
|
作者
Abdelaziz, Omar [1 ]
Shehata, Mohamed [1 ]
机构
[1] Univ British Columbia, Dept Comp Sci Math Phys & Stat, 3333 Univ Way, Kelowna, BC V1V1V7, Canada
关键词
Single object tracking; Deformable convolutions; Vision transformers; One-stream trackers;
D O I
10.1007/s11760-024-03713-0
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Single object tracking is still challenging because it requires localizing an arbitrary object in a sequence of frames, given only its appearance in the first frame of the sequence. Many trackers, especially those leveraging the Vision Transformer (ViT) backbone, have achieved superior performance. However, the gap between the performance metrics measured on the training data and those on the test data is still large. To alleviate this issue, we propose the deformable masking module in the transformer-based trackers. The deformable masking module is injected after each layer of the ViT. First, It masks out complete vectors of the output representations of the ViT layer. After that, the masked representations are fed into a deformable convolution to reconstruct new reliable representations. The output of the last layer of the ViT is modified by fusing it with all intermediate outputs of the deformable masking modules to produce a final robust attentional feature map. We extensively evaluate the performance of our model, named DMTrack, on seven different tracking benchmarks. Our model outperforms the previous state-of-the-art techniques by (+2%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$+\,2\%$$\end{document}) while having fewer parameters (-92.4%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-\,92.4\%$$\end{document}). Moreover, our model matches the performance of much larger models in terms of parameters, indicating our training strategy's effectiveness.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Representation Learning for Visual Object Tracking by Masked Appearance Transfer
    Zhao, Haojie
    Wang, Dong
    Lu, Huchuan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18696 - 18705
  • [2] Learning Dynamic Compact Memory Embedding for Deformable Visual Object Tracking
    Yu, Hongtao
    Zhu, Pengfei
    Zhang, Kaihua
    Wang, Yu
    Zhao, Shuai
    Wang, Lei
    Zhang, Tianzhu
    Hu, Qinghua
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) : 5656 - 5670
  • [3] Visual Object Tracking with Autoencoder Representations
    Besbinar, Beril
    Alatan, A. Aydin
    2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 2041 - 2044
  • [4] Accurate visual representation learning for single object tracking
    Hua Bao
    Ping Shu
    Qijun Wang
    Multimedia Tools and Applications, 2022, 81 : 24059 - 24079
  • [5] Accurate visual representation learning for single object tracking
    Bao, Hua
    Shu, Ping
    Wang, Qijun
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (17) : 24059 - 24079
  • [6] Single online visual object tracking with enhanced tracking and detection learning
    Yang Yi
    Liping Luo
    Zhenxian Zheng
    Multimedia Tools and Applications, 2019, 78 : 12333 - 12351
  • [7] Single online visual object tracking with enhanced tracking and detection learning
    Yi, Yang
    Luo, Liping
    Zheng, Zhenxian
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (09) : 12333 - 12351
  • [8] Masked Autoencoders as Single Object Tracking Learners
    Bo, Chunjuan
    Chen, Xin
    Zhang, Junxing
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 80 (01): : 1105 - 1122
  • [9] Deformable Siamese Attention Networks for Visual Object Tracking
    Yu, Yuechen
    Xiong, Yilei
    Huang, Weilin
    Scott, Matthew R.
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 6727 - 6736
  • [10] ROBUST TRAJECTORY TRACKING WITH OPTIMAL VISUAL SERVOING ON A DEFORMABLE OBJECT
    Derrar, Yasser
    Saidi, Farah
    Malti, Abed
    International Journal of Robotics and Automation, 2023, 38 (03): : 180 - 193