Learning rich feature representation and aggregation for accurate visual tracking

被引:3
|
作者
Yang, Yijin [1 ]
Gu, Xiaodong [1 ]
机构
[1] Fudan Univ, Dept Elect Engn, Shanghai 200438, Peoples R China
基金
中国国家自然科学基金;
关键词
Visual object tracking; Tracking-by-segmentation; Feature representation and aggregation; Template update; Bounding box refinement; SIAMESE NETWORKS; ROBUST;
D O I
10.1007/s10489-023-04998-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual tracking is a key component of computer vision and has a wide range of practical applications. Recently, the tracking-by-segmentation framework has been widely applied in visual tracking due to its astonishing performance on accuracy. It attempts to learn from the framework of video object segmentation to realize accurate tracking. Although segmentation-based trackers are effective for target scale estimation, the segmentation network makes the trackers have high requirements for the extracted target features due to the need for pixel-level segmentation. Therefore, in this article, we propose a novel feature representation and aggregation network and introduce it into the tracking-by-segmentation framework to extract and integrate rich features for accurate and robust segmentation tracking. To be specific, firstly, the proposed approach models three complementary feature representations, including contextual semantic, local position, and structural patch feature representations, through cross-attention, cross-correlation and dilated involution mechanisms respectively. Secondly, these features are fused by a simple feature aggregation network. Thirdly, the fusion features are fed into the segmentation network to obtain accurate target state estimation. In addition, to adapt the segmentation network to the appearance changes and partial occlusion, we introduce a template update strategy and a bounding box refinement module for robust segmentation and tracking. The extensive experimental results on twelve challenging tracking benchmarks show that the proposed tracker outperforms most of the state-of-the-art trackers and achieves very promising tracking performance on the OTB100 and VOT2018 benchmarks.
引用
收藏
页码:28114 / 28132
页数:19
相关论文
共 50 条
  • [21] Adversarial Feature Sampling Learning for Efficient Visual Tracking
    Yin, Yingjie
    Xu, De
    Wang, Xingang
    Zhang, Lei
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2020, 17 (02) : 847 - 857
  • [22] Exploiting multi-scale hierarchical feature representation for visual tracking
    Jun Wang
    Peng Yin
    Wenhui Yang
    Yuanyun Wang
    Shengqian Wang
    Complex & Intelligent Systems, 2024, 10 : 3617 - 3632
  • [23] Exploiting multi-scale hierarchical feature representation for visual tracking
    Wang, Jun
    Yin, Peng
    Yang, Wenhui
    Wang, Yuanyun
    Wang, Shengqian
    COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (03) : 3617 - 3632
  • [24] Multi-Label Visual Feature Learning with Attentional Aggregation
    Guan, Ziqiao
    Yager, Kevin G.
    Yu, Dantong
    Qin, Hong
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 2190 - 2198
  • [25] Polysemious visual representation based on feature aggregation for large scale image applications
    Xinghang Song
    Shuqiang Jiang
    Shuhui Wang
    Liang Li
    Qingming Huang
    Multimedia Tools and Applications, 2015, 74 : 595 - 611
  • [26] Polysemious visual representation based on feature aggregation for large scale image applications
    Song, Xinghang
    Jiang, Shuqiang
    Wang, Shuhui
    Li, Liang
    Huang, Qingming
    MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (02) : 595 - 611
  • [27] Learning deep convolutional descriptor aggregation for efficient visual tracking
    Xiao Ke
    Yuezhou Li
    Wenzhong Guo
    Yanyan Huang
    Neural Computing and Applications, 2022, 34 : 3745 - 3765
  • [28] Learning deep convolutional descriptor aggregation for efficient visual tracking
    Ke, Xiao
    Li, Yuezhou
    Guo, Wenzhong
    Huang, Yanyan
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (05): : 3745 - 3765
  • [29] Memory Storable Network Based Feature Aggregation for Speaker Representation Learning
    Gu, Bin
    Guo, Wu
    Zhang, Jie
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 643 - 655
  • [30] Joint Correlation and Attention Based Feature Fusion Network for Accurate Visual Tracking
    Yang, Yijin
    Gu, Xiaodong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 1705 - 1715