Relation-Guided Multi-stage Feature Aggregation Network for Video Object Detection

被引:0
|
作者
Yao, Tingting [1 ]
Cao, Fuxiao [1 ]
Mi, Fuheng [1 ]
Li, Danmeng [1 ]
机构
[1] Dalian Maritime Univ, Coll Informat Sci & Technol, Dalian 116026, Peoples R China
基金
中国国家自然科学基金;
关键词
Video object detection; Temporal context information; Feature aggregation; Temporal relation-guided;
D O I
10.1007/978-981-99-8537-1_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video object detection task has received extensive research attention and various methods have been proposed. The quality of single frame in the original video is usually deteriorated by motion blur and object occlusion, which leads to the failure of detection. Although some methods have attempted to enhance the feature representation of each frame by aggregating temporal context information from other frames, the existing methods are usually sensitive to the change of object appearance and scale, which lead to false or missing detection. Therefore, in this paper, we propose a Relation-guided Multi-stage Feature Aggregation (RMFA) network for video object detection. First, a Multi-Stage Feature Aggregation (MSFA) framework is devised to aggregate the feature representation of global and local support frames in each stage. In this way, both global semantic information and local motion information could be better captured. Furthermore, a Multi-sources Feature Aggregation (MFA) module is proposed to enhance the quality of support frames, hence the feature representation of current frame could be improved. Finally, a Temporal Relation-Guided (TRG) module is proposed to improve the feature aggregation perception by supervising the semantic similarity relationships between different object proposals. Therefore, the model adaptability to selectively store valuable features could be enhanced. Qualitative and quantitative experimental results on the ImageNet VID dataset demonstrate that our model could achieve superior video object detection results against a number of the state-of-the-art ones. Especially, when object is occluded or under fast motion, our model shows outstanding performances.
引用
收藏
页码:146 / 157
页数:12
相关论文
共 50 条
  • [21] DUALFEAT: DUAL FEATURE AGGREGATION FOR VIDEO OBJECT DETECTION
    Pan, Jing
    Du, Kaiwen
    Yan, Yan
    Wang, Hanzi
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2901 - 2905
  • [22] Exploiting Better Feature Aggregation for Video Object Detection
    Han, Liang
    Wang, Pichao
    Yin, Zhaozheng
    Wang, Fan
    Li, Hao
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1469 - 1477
  • [23] Feature aggregation network for small object detection
    Jing, Rudong
    Zhang, Wei
    Li, Yuzhuo
    Li, Wenlin
    Liu, Yanyan
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
  • [24] Knowledge graph representation learning with relation-guided aggregation and interaction
    Shang, Bin
    Zhao, Yinliang
    Liu, Jun
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (04)
  • [25] Multi-stage Reinforcement Learning for Object Detection
    Koenig, Jonas
    Malberg, Simon
    Martens, Martin
    Niehaus, Sebastian
    Krohn-Grimberghe, Artus
    Ramaswamy, Arunselvan
    ADVANCES IN COMPUTER VISION, CVC, VOL 1, 2020, 943 : 178 - 191
  • [26] Patchwise Temporal-Spatial Feature Aggregation Network for Object Detection in Satellite Video
    Zheng, Shangdong
    Wu, Zebin
    Xu, Yang
    Liu, Pengfei
    Zheng, Peng
    Wei, Zhihui
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21
  • [27] Multi-Scale Residual Aggregation Feature Pyramid Network for Object Detection
    Wang, Hongyang
    Wang, Tiejun
    ELECTRONICS, 2023, 12 (01)
  • [28] A Multi-stage Network for Improving the Sample Quality in Aerial Image Object Detection
    Han, Wei
    Feng, Ruyi
    Wang, Lizhe
    Li, Fengpeng
    Wu, Lin
    IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 4076 - 4079
  • [29] HEAD DETECTION BASED ON CONVOLUTIONAL NEURAL NETWORK WITH MULTI-STAGE WEIGHTED FEATURE
    Rui, Ting
    Fei, Jian-chao
    Cui, Peng
    Zhou, You
    Fang, Hu-sheng
    2015 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING, 2015, : 147 - 150
  • [30] Multi-attention guided feature fusion network for salient object detection
    Li, Anni
    Qi, JinQing
    Lu, Huchuan
    NEUROCOMPUTING, 2020, 411 : 416 - 427