DMTrack: learning deformable masked visual representations for single object tracking

被引:0
|
作者
Abdelaziz, Omar [1 ]
Shehata, Mohamed [1 ]
机构
[1] Univ British Columbia, Dept Comp Sci Math Phys & Stat, 3333 Univ Way, Kelowna, BC V1V1V7, Canada
关键词
Single object tracking; Deformable convolutions; Vision transformers; One-stream trackers;
D O I
10.1007/s11760-024-03713-0
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Single object tracking is still challenging because it requires localizing an arbitrary object in a sequence of frames, given only its appearance in the first frame of the sequence. Many trackers, especially those leveraging the Vision Transformer (ViT) backbone, have achieved superior performance. However, the gap between the performance metrics measured on the training data and those on the test data is still large. To alleviate this issue, we propose the deformable masking module in the transformer-based trackers. The deformable masking module is injected after each layer of the ViT. First, It masks out complete vectors of the output representations of the ViT layer. After that, the masked representations are fed into a deformable convolution to reconstruct new reliable representations. The output of the last layer of the ViT is modified by fusing it with all intermediate outputs of the deformable masking modules to produce a final robust attentional feature map. We extensively evaluate the performance of our model, named DMTrack, on seven different tracking benchmarks. Our model outperforms the previous state-of-the-art techniques by (+2%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$+\,2\%$$\end{document}) while having fewer parameters (-92.4%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-\,92.4\%$$\end{document}). Moreover, our model matches the performance of much larger models in terms of parameters, indicating our training strategy's effectiveness.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Variable scale learning for visual object tracking
    Xuedong He
    Lu Zhao
    Calvin Yu-Chian Chen
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 : 3315 - 3330
  • [22] Learning to Adversarially Blur Visual Object Tracking
    Guo, Qing
    Cheng, Ziyi
    Juefei-Xu, Felix
    Ma, Lei
    Xie, Xiaofei
    Liu, Yang
    Zhao, Jianjun
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 10819 - 10828
  • [23] Progressive Unsupervised Learning for Visual Object Tracking
    Wu, Qiangqiang
    Wan, Jia
    Chan, Antoni B.
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 2992 - 3001
  • [24] Deep mutual learning for visual object tracking
    Zhao, Haojie
    Yang, Gang
    Wang, Dong
    Lu, Huchuan
    PATTERN RECOGNITION, 2021, 112 (112)
  • [25] Visual Learning in Multiple-Object Tracking
    Makovski, Tal
    Vazquez, Gustavo A.
    Jiang, Yuhong V.
    PLOS ONE, 2008, 3 (05):
  • [26] Variable scale learning for visual object tracking
    He, Xuedong
    Zhao, Lu
    Chen, Calvin Yu-Chian
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 14 (4) : 3315 - 3330
  • [27] MIMTracking: Masked image modeling enhanced vision transformer for visual object tracking
    Zhang, Shuo
    Zhang, Dan
    Zou, Qi
    NEUROCOMPUTING, 2024, 606
  • [28] SiamADT: Siamese Attention and Deformable Features Fusion Network for Visual Object Tracking
    Wang, Fasheng
    Cao, Ping
    Wang, Xing
    He, Bing
    Sun, Fuming
    NEURAL PROCESSING LETTERS, 2023, 55 (06) : 7933 - 7950
  • [29] SiamADT: Siamese Attention and Deformable Features Fusion Network for Visual Object Tracking
    Fasheng Wang
    Ping Cao
    Xing Wang
    Bing He
    Fuming Sun
    Neural Processing Letters, 2023, 55 : 7933 - 7950
  • [30] Learning Attribute-Specific Representations for Visual Tracking
    Qi, Yuankai
    Zhang, Shengping
    Zhang, Weigang
    Su, Li
    Huang, Qingming
    Yang, Ming-Hsuan
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8835 - 8842