Relation-Guided Multi-stage Feature Aggregation Network for Video Object Detection

被引：0

作者：

Yao, Tingting ^{[1
]}

Cao, Fuxiao ^{[1
]}

Mi, Fuheng ^{[1
]}

Li, Danmeng ^{[1
]}

机构：

[1] Dalian Maritime Univ, Coll Informat Sci & Technol, Dalian 116026, Peoples R China

来源：

PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VI | 2024年 / 14430卷

基金：

中国国家自然科学基金;

关键词：

Video object detection; Temporal context information; Feature aggregation; Temporal relation-guided;

D O I：

10.1007/978-981-99-8537-1_12

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video object detection task has received extensive research attention and various methods have been proposed. The quality of single frame in the original video is usually deteriorated by motion blur and object occlusion, which leads to the failure of detection. Although some methods have attempted to enhance the feature representation of each frame by aggregating temporal context information from other frames, the existing methods are usually sensitive to the change of object appearance and scale, which lead to false or missing detection. Therefore, in this paper, we propose a Relation-guided Multi-stage Feature Aggregation (RMFA) network for video object detection. First, a Multi-Stage Feature Aggregation (MSFA) framework is devised to aggregate the feature representation of global and local support frames in each stage. In this way, both global semantic information and local motion information could be better captured. Furthermore, a Multi-sources Feature Aggregation (MFA) module is proposed to enhance the quality of support frames, hence the feature representation of current frame could be improved. Finally, a Temporal Relation-Guided (TRG) module is proposed to improve the feature aggregation perception by supervising the semantic similarity relationships between different object proposals. Therefore, the model adaptability to selectively store valuable features could be enhanced. Qualitative and quantitative experimental results on the ImageNet VID dataset demonstrate that our model could achieve superior video object detection results against a number of the state-of-the-art ones. Especially, when object is occluded or under fast motion, our model shows outstanding performances.

引用

页码：146 / 157

页数：12

共 50 条

[1] Global Context Relation-Guided Feature Aggregation Network for Salient Object Detection in Optical Remote Sensing Images
Li, Jian
Li, Chuankun
Zheng, Xiao
Liu, Xinwang
Tang, Chang
REMOTE SENSING, 2024, 16 (16)
[2] Flow-Guided Feature Aggregation for Video Object Detection
Zhu, Xizhou
Wang, Yujie
Dai, Jifeng
Yuan, Lu
Wei, Yichen
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 408 - 417
[3] GUIDED SAMPLING BASED FEATURE AGGREGATION FOR VIDEO OBJECT DETECTION
Liang, Jun
Chen, Haosheng
Yan, Yan
Lu, Yang
Wang, Hanzi
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1116 - 1120
[4] A Multi-Stage Feature Aggregation and Structure Awareness Network for Concrete Bridge Crack Detection
Zhang, Erhu
Jiang, Tao
Duan, Jinghong
SENSORS, 2024, 24 (05)
[5] Attention guided multi-level feature aggregation network for camouflaged object detection
Wang, Anzhi
Ren, Chunhong
Zhao, Shuang
Mu, Shibiao
IMAGE AND VISION COMPUTING, 2024, 144
[6] Semantic Guided Feature Aggregation Network for Salient Object Detection
Wang Z.-W.
Song H.-H.
Fan J.-Q.
Liu Q.-S.
Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (11): : 2386 - 2395
[7] Attention-Guided Disentangled Feature Aggregation for Video Object Detection
Muralidhara, Shishir
Hashmi, Khurram Azeem
Pagani, Alain
Liwicki, Marcus
Stricker, Didier
Afzal, Muhammad Zeshan
SENSORS, 2022, 22 (21)
[8] Boundary-Guided Feature Aggregation Network for Salient Object Detection
Zhuge, Yunzhi
Yang, Gang
Zhang, Pingping
Lu, Huchuan
IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (12) : 1800 - 1804
[9] Multi-feature aggregation network for salient object detection
Huang, Hu
Liu, Ping
Wang, Yanzhao
Zhou, Tongchi
Qu, Boyang
Tao, Aimin
Zhang, Hao
SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (04) : 1043 - 1051
[10] Class-Aware Feature Aggregation Network for Video Object Detection
Han, Liang
Wang, Pichao
Yin, Zhaozheng
Wang, Fan
Li, Hao
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (12) : 8165 - 8178

← 1 2 3 4 5 →