Learning rich feature representation and aggregation for accurate visual tracking

被引：3

作者：

Yang, Yijin ^{[1
]}

Gu, Xiaodong ^{[1
]}

机构：

[1] Fudan Univ, Dept Elect Engn, Shanghai 200438, Peoples R China

来源：

APPLIED INTELLIGENCE | 2023年 / 53卷 / 23期

基金：

中国国家自然科学基金;

关键词：

Visual object tracking; Tracking-by-segmentation; Feature representation and aggregation; Template update; Bounding box refinement; SIAMESE NETWORKS; ROBUST;

D O I：

10.1007/s10489-023-04998-3

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual tracking is a key component of computer vision and has a wide range of practical applications. Recently, the tracking-by-segmentation framework has been widely applied in visual tracking due to its astonishing performance on accuracy. It attempts to learn from the framework of video object segmentation to realize accurate tracking. Although segmentation-based trackers are effective for target scale estimation, the segmentation network makes the trackers have high requirements for the extracted target features due to the need for pixel-level segmentation. Therefore, in this article, we propose a novel feature representation and aggregation network and introduce it into the tracking-by-segmentation framework to extract and integrate rich features for accurate and robust segmentation tracking. To be specific, firstly, the proposed approach models three complementary feature representations, including contextual semantic, local position, and structural patch feature representations, through cross-attention, cross-correlation and dilated involution mechanisms respectively. Secondly, these features are fused by a simple feature aggregation network. Thirdly, the fusion features are fed into the segmentation network to obtain accurate target state estimation. In addition, to adapt the segmentation network to the appearance changes and partial occlusion, we introduce a template update strategy and a bounding box refinement module for robust segmentation and tracking. The extensive experimental results on twelve challenging tracking benchmarks show that the proposed tracker outperforms most of the state-of-the-art trackers and achieves very promising tracking performance on the OTB100 and VOT2018 benchmarks.

引用

页码：28114 / 28132

页数：19

共 50 条

[21] Adversarial Feature Sampling Learning for Efficient Visual Tracking
Yin, Yingjie
Xu, De
Wang, Xingang
Zhang, Lei
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2020, 17 (02) : 847 - 857
[22] Exploiting multi-scale hierarchical feature representation for visual tracking
Jun Wang
Peng Yin
Wenhui Yang
Yuanyun Wang
Shengqian Wang
Complex & Intelligent Systems, 2024, 10 : 3617 - 3632
[23] Exploiting multi-scale hierarchical feature representation for visual tracking
Wang, Jun
Yin, Peng
Yang, Wenhui
Wang, Yuanyun
Wang, Shengqian
COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (03) : 3617 - 3632
[24] Multi-Label Visual Feature Learning with Attentional Aggregation
Guan, Ziqiao
Yager, Kevin G.
Yu, Dantong
Qin, Hong
2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 2190 - 2198
[25] Polysemious visual representation based on feature aggregation for large scale image applications
Xinghang Song
Shuqiang Jiang
Shuhui Wang
Liang Li
Qingming Huang
Multimedia Tools and Applications, 2015, 74 : 595 - 611
[26] Polysemious visual representation based on feature aggregation for large scale image applications
Song, Xinghang
Jiang, Shuqiang
Wang, Shuhui
Li, Liang
Huang, Qingming
MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (02) : 595 - 611
[27] Learning deep convolutional descriptor aggregation for efficient visual tracking
Xiao Ke
Yuezhou Li
Wenzhong Guo
Yanyan Huang
Neural Computing and Applications, 2022, 34 : 3745 - 3765
[28] Learning deep convolutional descriptor aggregation for efficient visual tracking
Ke, Xiao
Li, Yuezhou
Guo, Wenzhong
Huang, Yanyan
NEURAL COMPUTING & APPLICATIONS, 2022, 34 (05): : 3745 - 3765
[29] Memory Storable Network Based Feature Aggregation for Speaker Representation Learning
Gu, Bin
Guo, Wu
Zhang, Jie
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 643 - 655
[30] Joint Correlation and Attention Based Feature Fusion Network for Accurate Visual Tracking
Yang, Yijin
Gu, Xiaodong
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 1705 - 1715

← 1 2 3 4 5 →