Relation-Guided Multi-stage Feature Aggregation Network for Video Object Detection

被引:0
|
作者
Yao, Tingting [1 ]
Cao, Fuxiao [1 ]
Mi, Fuheng [1 ]
Li, Danmeng [1 ]
机构
[1] Dalian Maritime Univ, Coll Informat Sci & Technol, Dalian 116026, Peoples R China
基金
中国国家自然科学基金;
关键词
Video object detection; Temporal context information; Feature aggregation; Temporal relation-guided;
D O I
10.1007/978-981-99-8537-1_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video object detection task has received extensive research attention and various methods have been proposed. The quality of single frame in the original video is usually deteriorated by motion blur and object occlusion, which leads to the failure of detection. Although some methods have attempted to enhance the feature representation of each frame by aggregating temporal context information from other frames, the existing methods are usually sensitive to the change of object appearance and scale, which lead to false or missing detection. Therefore, in this paper, we propose a Relation-guided Multi-stage Feature Aggregation (RMFA) network for video object detection. First, a Multi-Stage Feature Aggregation (MSFA) framework is devised to aggregate the feature representation of global and local support frames in each stage. In this way, both global semantic information and local motion information could be better captured. Furthermore, a Multi-sources Feature Aggregation (MFA) module is proposed to enhance the quality of support frames, hence the feature representation of current frame could be improved. Finally, a Temporal Relation-Guided (TRG) module is proposed to improve the feature aggregation perception by supervising the semantic similarity relationships between different object proposals. Therefore, the model adaptability to selectively store valuable features could be enhanced. Qualitative and quantitative experimental results on the ImageNet VID dataset demonstrate that our model could achieve superior video object detection results against a number of the state-of-the-art ones. Especially, when object is occluded or under fast motion, our model shows outstanding performances.
引用
收藏
页码:146 / 157
页数:12
相关论文
共 50 条
  • [31] Motion cues guided feature aggregation and enhancement for video object segmentation
    Li, Xuejun
    Zheng, Wenming
    Zong, Yuan
    NEUROCOMPUTING, 2022, 493 : 176 - 190
  • [32] A Multi-Scale Learnable Feature Alignment Network for Video Object Detection
    Wang, Rui
    2024 IEEE 21ST INTERNATIONAL CONFERENCE ON MOBILE AD-HOC AND SMART SYSTEMS, MASS 2024, 2024, : 496 - 501
  • [33] Object Guided External Memory Network for Video Object Detection
    Deng, Hanming
    Hua, Yang
    Song, Tao
    Zhang, Zongpu
    Xue, Zhengui
    Ma, Ruhui
    Robertson, Neil
    Guan, Haibing
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6677 - 6686
  • [34] RELATION-GUIDED NETWORK FOR IMAGE-TEXT RETRIEVAL
    Yang, Yulou
    Shen, Hao
    Yang, Ming
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1856 - 1860
  • [35] Lightweight Saliency Object Detection Guided by Deep Feature Aggregation
    Li, Junwen
    Zhang, Hongying
    Han, Bin
    Computer Engineering and Applications, 2023, 59 (19) : 122 - 129
  • [36] Dual-Memory Feature Aggregation for Video Object Detection
    Fan, Diwei
    Zheng, Huicheng
    Dang, Jisheng
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VI, 2024, 14430 : 220 - 232
  • [37] Video Object Detection Using Motion Context and Feature Aggregation
    Kim, Jaekyum
    Koh, Junho
    Choi, Jun Won
    11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 269 - 272
  • [38] Temporal Context Enhanced Feature Aggregation for Video Object Detection
    He, Fei
    Gao, Naiyu
    Li, Qiaozhe
    Du, Senyao
    Zhao, Xin
    Huang, Kaiqi
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 10941 - 10948
  • [39] Multi-stage Tag Guidance Network in Video Caption
    Wang, Lanxiao
    Shang, Chao
    Qiu, Heqian
    Zhao, Taijin
    Qiu, Benliu
    Li, Hongliang
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4610 - 4614
  • [40] Feature Aggregation and Propagation Network for Camouflaged Object Detection
    Zhou, Tao
    Zhou, Yi
    Gong, Chen
    Yang, Jian
    Zhang, Yu
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 7036 - 7047