Temporal relation transformer for robust visual tracking with dual-memory learning

被引:0
|
作者
Nie, Guohao [1 ]
Wang, Xingmei [1 ,2 ]
Yan, Zining [1 ,3 ]
Xu, Xiaoyuan [1 ]
Liu, Bo [4 ]
机构
[1] Harbin Engn Univ, Coll Comp Sci & Technol, Harbin 150001, Peoples R China
[2] Harbin Engn Univ, Natl Key Lab Underwater Acoust Technol, Harbin 150001, Peoples R China
[3] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 119077, Singapore
[4] Key Lab Avion Syst Integrated Technol, Shanghai 200030, Peoples R China
关键词
Visual tracking; Transformer; Temporal relation modeling; Memory mechanism; OBJECT TRACKING;
D O I
10.1016/j.asoc.2024.112229
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, transformer trackers mostly associate multiple reference images with the search area to adapt to the changing appearance of the target. However, they ignore the learned cross-relations between the target and surrounding, leading to difficulties in building coherent contextual models for specific target instances. This paper presents a Temporal Relation Transformer Tracker (TRTT) for robust visual tracking, providing a concise approach to modeling temporal relations by dual target memory learning. Specifically, a temporal relation transformer network generates paired memories based on static and dynamic templates, which are reinforced interactively. The memory contains implicit relation hints that capture the relations between the tracked object and its immediate surroundings. More importantly, to ensure consistency of target instance identities between frames, the relation hints from previous frames are transferred to the current frame for merging temporal contextual attention. Our method also incorporates mechanisms for reusing favorable cross-relations and instance-specific features, thereby overcoming background interference in complex spatio-temporal interactions through a sequential constraint. Furthermore, we design a memory token sparsification method that leverages the key points of the target to eliminate interferences and optimize attention calculations. Extensive experiments demonstrate that our method surpasses advanced trackers on 8 challenging benchmarks while maintaining real-time running speed.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Learning Spatio-Temporal Transformer for Visual Tracking
    Yan, Bin
    Peng, Houwen
    Fu, Jianlong
    Wang, Dong
    Lu, Huchuan
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 10428 - 10437
  • [2] Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking
    Wang, Ning
    Zhou, Wengang
    Wang, Jie
    Li, Houqiang
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1571 - 1580
  • [3] Reliable correlation tracking via dual-memory selection model
    Li, Guiji
    Peng, Manman
    Nai, Ke
    Li, Zhiyong
    Li, Keqin
    INFORMATION SCIENCES, 2020, 518 : 238 - 255
  • [4] A dual-memory architecture for reinforcement learning on neuromorphic platforms
    Olin-Ammentorp, Wilkie
    Sokolov, Yury
    Bazhenov, Maxim
    NEUROMORPHIC COMPUTING AND ENGINEERING, 2021, 1 (02):
  • [5] Memory Prompt for Spatio-Temporal Transformer Visual Object Tracking
    Xu T.
    Wu X.
    Zhu X.
    Kittler J.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (08): : 1 - 6
  • [6] Learning Feature Restoration Transformer for Robust Dehazing Visual Object Tracking
    Xu, Tianyang
    Pan, Yifan
    Feng, Zhenhua
    Zhu, Xuefeng
    Cheng, Chunyang
    Wu, Xiao-Jun
    Kittler, Josef
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (12) : 6021 - 6038
  • [7] Robust Visual Tracking with Dual Spatio-Temporal Context Trackers
    Sun, Shiyan
    Zhang, Hong
    Yuan, Ding
    SEVENTH INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2015), 2015, 9817
  • [8] Repformer: a robust shared-encoder dual-pipeline transformer for visual tracking
    Fengwei Gu
    Jun Lu
    Chengtao Cai
    Qidan Zhu
    Zhaojie Ju
    Neural Computing and Applications, 2023, 35 : 20581 - 20603
  • [9] Repformer: a robust shared-encoder dual-pipeline transformer for visual tracking
    Gu, Fengwei
    Lu, Jun
    Cai, Chengtao
    Zhu, Qidan
    Ju, Zhaojie
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (28): : 20581 - 20603
  • [10] Dual-Memory Model for Incremental Learning: The Handwriting Recognition Use Case
    Piot, Melanie
    Bourdoulous, Berangere
    Gonzalez, Jordan
    Deshayes, Aurelia
    Prevost, Lionel
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5527 - 5534