DMTrack: learning deformable masked visual representations for single object tracking

被引:0
|
作者
Abdelaziz, Omar [1 ]
Shehata, Mohamed [1 ]
机构
[1] Univ British Columbia, Dept Comp Sci Math Phys & Stat, 3333 Univ Way, Kelowna, BC V1V1V7, Canada
关键词
Single object tracking; Deformable convolutions; Vision transformers; One-stream trackers;
D O I
10.1007/s11760-024-03713-0
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Single object tracking is still challenging because it requires localizing an arbitrary object in a sequence of frames, given only its appearance in the first frame of the sequence. Many trackers, especially those leveraging the Vision Transformer (ViT) backbone, have achieved superior performance. However, the gap between the performance metrics measured on the training data and those on the test data is still large. To alleviate this issue, we propose the deformable masking module in the transformer-based trackers. The deformable masking module is injected after each layer of the ViT. First, It masks out complete vectors of the output representations of the ViT layer. After that, the masked representations are fed into a deformable convolution to reconstruct new reliable representations. The output of the last layer of the ViT is modified by fusing it with all intermediate outputs of the deformable masking modules to produce a final robust attentional feature map. We extensively evaluate the performance of our model, named DMTrack, on seven different tracking benchmarks. Our model outperforms the previous state-of-the-art techniques by (+2%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$+\,2\%$$\end{document}) while having fewer parameters (-92.4%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-\,92.4\%$$\end{document}). Moreover, our model matches the performance of much larger models in terms of parameters, indicating our training strategy's effectiveness.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Online learning of multiple detectors for visual object tracking
    Quan, Wei
    Chen, Jin-Xiong
    Yu, Nan-Yang
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2014, 42 (05): : 875 - 882
  • [42] Visual object tracking by correlation filters and online learning
    Zhang, Xin
    Xia, Gui-Song
    Lu, Qikai
    Shen, Weiming
    Zhang, Liangpei
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2018, 140 : 77 - 89
  • [43] Deformable Object Tracking With Gated Fusion
    Liu, Wenxi
    Song, Yibing
    Chen, Dengsheng
    He, Shengfeng
    Yu, Yuanlong
    Yan, Tao
    Hancke, Gehard P.
    Lau, Rynson W. H.
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (08) : 3766 - 3777
  • [44] SeqTrack: Sequence to Sequence Learning for Visual Object Tracking
    Chen, Xin
    Peng, Houwen
    Wang, Dong
    Lu, Huchuan
    Hu, Han
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14572 - 14581
  • [45] Learning Dynamic Siamese Network for Visual Object Tracking
    Guo, Qing
    Feng, Wei
    Zhou, Ce
    Huang, Rui
    Wan, Liang
    Wang, Song
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1781 - 1789
  • [46] Object tracking using deformable templates
    Zhong, Y
    Jain, AK
    Dubuisson-Jolly, MP
    SIXTH INTERNATIONAL CONFERENCE ON COMPUTER VISION, 1998, : 440 - 445
  • [47] Object tracking using deformable templates
    Zhong, Y
    Jain, AK
    Dubuisson-Jolly, MP
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2000, 22 (05) : 544 - 549
  • [48] DIMBA: discretely masked black-box attack in single object tracking
    Xiangyu Yin
    Wenjie Ruan
    Jonathan Fieldsend
    Machine Learning, 2024, 113 : 1705 - 1723
  • [49] DIMBA: discretely masked black-box attack in single object tracking
    Yin, Xiangyu
    Ruan, Wenjie
    Fieldsend, Jonathan
    MACHINE LEARNING, 2024, 113 (04) : 1705 - 1723
  • [50] Integrating visual perception and manipulation for autonomous learning of object representations
    Schiebener, David
    Morimoto, Jun
    Asfour, Tamim
    Ude, Ales
    ADAPTIVE BEHAVIOR, 2013, 21 (05) : 328 - 345