DMTrack: learning deformable masked visual representations for single object tracking

被引:0
|
作者
Abdelaziz, Omar [1 ]
Shehata, Mohamed [1 ]
机构
[1] Univ British Columbia, Dept Comp Sci Math Phys & Stat, 3333 Univ Way, Kelowna, BC V1V1V7, Canada
关键词
Single object tracking; Deformable convolutions; Vision transformers; One-stream trackers;
D O I
10.1007/s11760-024-03713-0
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Single object tracking is still challenging because it requires localizing an arbitrary object in a sequence of frames, given only its appearance in the first frame of the sequence. Many trackers, especially those leveraging the Vision Transformer (ViT) backbone, have achieved superior performance. However, the gap between the performance metrics measured on the training data and those on the test data is still large. To alleviate this issue, we propose the deformable masking module in the transformer-based trackers. The deformable masking module is injected after each layer of the ViT. First, It masks out complete vectors of the output representations of the ViT layer. After that, the masked representations are fed into a deformable convolution to reconstruct new reliable representations. The output of the last layer of the ViT is modified by fusing it with all intermediate outputs of the deformable masking modules to produce a final robust attentional feature map. We extensively evaluate the performance of our model, named DMTrack, on seven different tracking benchmarks. Our model outperforms the previous state-of-the-art techniques by (+2%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$+\,2\%$$\end{document}) while having fewer parameters (-92.4%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-\,92.4\%$$\end{document}). Moreover, our model matches the performance of much larger models in terms of parameters, indicating our training strategy's effectiveness.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Learning Linear Regression via Single-Convolutional Layer for Visual Object Tracking
    Chen, Kai
    Tao, Wenbing
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (01) : 86 - 97
  • [32] Severely Blurred Object Tracking by Learning Deep Image Representations
    Ding, Jianwei
    Huang, Yongzhen
    Liu, Wei
    Huang, Kaiqi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2016, 26 (02) : 319 - 331
  • [33] Maximum Entropy Reinforced Single Object Visual Tracking
    Liu, Chenghuan
    Huynh, Du Q.
    Reynolds, Mark
    ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 2744 - 2751
  • [34] Visual and Language Collaborative Learning for RGBT Object Tracking
    Wang, Jiahao
    Liu, Fang
    Jiao, Licheng
    Gao, Yingjia
    Wang, Hao
    Li, Shuo
    Li, Lingling
    Chen, Puhua
    Liu, Xu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 12770 - 12781
  • [35] Online dual dictionary learning for visual object tracking
    Xu Cheng
    Yifeng Zhang
    Lin Zhou
    Guojun Lu
    Journal of Ambient Intelligence and Humanized Computing, 2021, 12 : 10881 - 10896
  • [36] Visual Object Tracking via Joint Learning Method
    Tian, Wei
    Lv, Jingyuan
    2014 6TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS, 2014, : 1163 - 1167
  • [37] Learning object intrinsic structure for robust visual tracking
    Wang, Q
    Xu, GY
    Ai, HZ
    2003 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL II, PROCEEDINGS, 2003, : 227 - 233
  • [38] Online dual dictionary learning for visual object tracking
    Cheng, Xu
    Zhang, Yifeng
    Zhou, Lin
    Lu, Guojun
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 12 (12) : 10881 - 10896
  • [39] Learning object-uncertainty policy for visual tracking
    He, Xuedong
    Chen, Calvin Yu-Chian
    INFORMATION SCIENCES, 2022, 582 : 60 - 72
  • [40] Learning Spatial Fusion and Matching for Visual Object Tracking
    Xiao, Wei
    Zhang, Zili
    PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT III, 2022, 13631 : 352 - 367