Dual-Memory Feature Aggregation for Video Object Detection

被引:0
|
作者
Fan, Diwei [1 ,2 ,3 ]
Zheng, Huicheng [1 ,2 ,3 ]
Dang, Jisheng [1 ,2 ,3 ]
机构
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou, Peoples R China
[2] Minist Educ, Key Lab Machine Intelligence & Adv Comp, Guangzhou, Peoples R China
[3] Guangdong Prov Key Lab Informat Secur Technol, Guangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
video object detection; feature aggregation; temporal information; global memory; local feature cache;
D O I
10.1007/978-981-99-8537-1_18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent studies on video object detection have shown the advantages of aggregating features across frames to capture temporal information, which can mitigate appearance degradation, such as occlusion, motion blur, and defocus. However, these methods often employ a sliding window or memory queue to store temporal information frame by frame, leading to discarding features of earlier frames over time. To address this, we propose a dual-memory feature aggregation framework (DMFA). DMFA simultaneously constructs a local feature cache and a global feature memory in a feature-wise updating way at different granularities, i.e., pixel level and proposal level. This approach can partially preserve key features across frames. The local feature cache stores the spatio-temporal contexts from nearby frames to boost the localization capacity, while the global feature memory enhances semantic feature representation by capturing temporal information from all previous frames. Moreover, we introduce contrastive learning to improve the discriminability of temporal features, resulting in more accurate proposal-level feature aggregation. Extensive experiments demonstrate that our method achieves state-of-the-art performance on the ImageNet VID benchmark.
引用
收藏
页码:220 / 232
页数:13
相关论文
共 50 条
  • [31] Incremental Dual-memory LSTM in Land Cover Prediction
    Jia, Xiaowei
    Khandelwal, Ankush
    Nayak, Guruprasad
    Gerber, James
    Carlson, Kimberly
    West, Paul
    Kumar, Vipin
    KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 867 - 876
  • [32] Feature Aggregation and Propagation Network for Camouflaged Object Detection
    Zhou, Tao
    Zhou, Yi
    Gong, Chen
    Yang, Jian
    Zhang, Yu
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 7036 - 7047
  • [33] Shape-Guided Dual-Memory Learning for 3D Anomaly Detection
    Chu, Yu-Min
    Liu, Chieh
    Hsieh, Ting-I
    Chen, Hwann-Tzong
    Liu, Tyng-Luh
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [34] FIANET: VIDEO OBJECT DETECTION VIA JOINT FEATURE-LEVEL AND INSTANCE-LEVEL AGGREGATION
    Wang, Zhengshuai
    Li, Yali
    Wang, Shengjin
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
  • [35] Optimized RT-DETR for accurate and efficient video object detection via decoupled feature aggregation
    Chen, Hao
    Huang, Wu
    Zhang, Tao
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2025, 14 (01)
  • [36] Video object detection algorithm based on multi-level feature aggregation under mixed sampler
    Qin S.
    Gai S.
    Da F.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2024, 58 (01): : 10 - 19
  • [37] Identity-Consistent Aggregation for Video Object Detection
    Deng, Chaorui
    Chen, Da
    Wu, Qi
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13388 - 13398
  • [38] Memory Aggregation Networks for Efficient Interactive Video Object Segmentation
    Miao, Jiaxu
    Wei, Yunchao
    Yang, Yi
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 10363 - 10372
  • [39] Sequence Level Semantics Aggregation for Video Object Detection
    Wu, Haiping
    Chen, Yuntao
    Wang, Naiyan
    Zhang, Zhaoxiang
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9216 - 9224
  • [40] Temporal feature enhancement network with external memory for live-stream video object detection
    Fujitake, Masato
    Sugimoto, Akihiro
    PATTERN RECOGNITION, 2022, 131