Learning rich feature representation and aggregation for accurate visual tracking

被引:3
|
作者
Yang, Yijin [1 ]
Gu, Xiaodong [1 ]
机构
[1] Fudan Univ, Dept Elect Engn, Shanghai 200438, Peoples R China
基金
中国国家自然科学基金;
关键词
Visual object tracking; Tracking-by-segmentation; Feature representation and aggregation; Template update; Bounding box refinement; SIAMESE NETWORKS; ROBUST;
D O I
10.1007/s10489-023-04998-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual tracking is a key component of computer vision and has a wide range of practical applications. Recently, the tracking-by-segmentation framework has been widely applied in visual tracking due to its astonishing performance on accuracy. It attempts to learn from the framework of video object segmentation to realize accurate tracking. Although segmentation-based trackers are effective for target scale estimation, the segmentation network makes the trackers have high requirements for the extracted target features due to the need for pixel-level segmentation. Therefore, in this article, we propose a novel feature representation and aggregation network and introduce it into the tracking-by-segmentation framework to extract and integrate rich features for accurate and robust segmentation tracking. To be specific, firstly, the proposed approach models three complementary feature representations, including contextual semantic, local position, and structural patch feature representations, through cross-attention, cross-correlation and dilated involution mechanisms respectively. Secondly, these features are fused by a simple feature aggregation network. Thirdly, the fusion features are fed into the segmentation network to obtain accurate target state estimation. In addition, to adapt the segmentation network to the appearance changes and partial occlusion, we introduce a template update strategy and a bounding box refinement module for robust segmentation and tracking. The extensive experimental results on twelve challenging tracking benchmarks show that the proposed tracker outperforms most of the state-of-the-art trackers and achieves very promising tracking performance on the OTB100 and VOT2018 benchmarks.
引用
收藏
页码:28114 / 28132
页数:19
相关论文
共 50 条
  • [1] Learning rich feature representation and aggregation for accurate visual tracking
    Yijin Yang
    Xiaodong Gu
    Applied Intelligence, 2023, 53 : 28114 - 28132
  • [2] Multi Feature Representation and Aggregation Network for Accurate and Robust Visual Tracking
    Yang, Yijin
    Gu, Xiaodong
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [3] Learning Rich Feature Representation and State Change Monitoring for Accurate Animal Target Tracking
    Yin, Kuan
    Feng, Jiangfan
    Dong, Shaokang
    ANIMALS, 2024, 14 (06):
  • [4] Accurate visual representation learning for single object tracking
    Bao, Hua
    Shu, Ping
    Wang, Qijun
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (17) : 24059 - 24079
  • [5] Accurate visual representation learning for single object tracking
    Hua Bao
    Ping Shu
    Qijun Wang
    Multimedia Tools and Applications, 2022, 81 : 24059 - 24079
  • [6] ADAPTIVE FEATURE REPRESENTATION FOR VISUAL TRACKING
    Han, Yuqi
    Deng, Chenwei
    Zhang, Zengshuo
    Li, Jiatong
    Zhao, Baojun
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 1867 - 1870
  • [7] Towards accurate estimation for visual object tracking with multi-hierarchy feature aggregation
    Wu, Jingjing
    Jiang, Jianguo
    Qi, Meibin
    Li, Xiaohong
    NEUROCOMPUTING, 2021, 451 : 252 - 264
  • [8] Dual Model Learning Combined With Multiple Feature Selection for Accurate Visual Tracking
    Zhang, Jianming
    Jin, Xiaokang
    Sun, Juan
    Wang, Jin
    Li, Keqin
    IEEE ACCESS, 2019, 7 (43956-43969) : 43956 - 43969
  • [9] EnhanceCenter for improving point based tracking and rich feature representation
    Yang, Hyun-Sung
    Park, Sung-Wook
    Jung, Se-Hoon
    Sim, Chun-Bo
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [10] LEARNING A TEMPORALLY INVARIANT REPRESENTATION FOR VISUAL TRACKING
    Ma, Chao
    Yang, Xiaokang
    Zhang, Chongyang
    Yang, Ming-Hsuan
    2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 857 - 861