Dual-Memory Feature Aggregation for Video Object Detection

被引:0
|
作者
Fan, Diwei [1 ,2 ,3 ]
Zheng, Huicheng [1 ,2 ,3 ]
Dang, Jisheng [1 ,2 ,3 ]
机构
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou, Peoples R China
[2] Minist Educ, Key Lab Machine Intelligence & Adv Comp, Guangzhou, Peoples R China
[3] Guangdong Prov Key Lab Informat Secur Technol, Guangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
video object detection; feature aggregation; temporal information; global memory; local feature cache;
D O I
10.1007/978-981-99-8537-1_18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent studies on video object detection have shown the advantages of aggregating features across frames to capture temporal information, which can mitigate appearance degradation, such as occlusion, motion blur, and defocus. However, these methods often employ a sliding window or memory queue to store temporal information frame by frame, leading to discarding features of earlier frames over time. To address this, we propose a dual-memory feature aggregation framework (DMFA). DMFA simultaneously constructs a local feature cache and a global feature memory in a feature-wise updating way at different granularities, i.e., pixel level and proposal level. This approach can partially preserve key features across frames. The local feature cache stores the spatio-temporal contexts from nearby frames to boost the localization capacity, while the global feature memory enhances semantic feature representation by capturing temporal information from all previous frames. Moreover, we introduce contrastive learning to improve the discriminability of temporal features, resulting in more accurate proposal-level feature aggregation. Extensive experiments demonstrate that our method achieves state-of-the-art performance on the ImageNet VID benchmark.
引用
收藏
页码:220 / 232
页数:13
相关论文
共 50 条
  • [41] Object Guided External Memory Network for Video Object Detection
    Deng, Hanming
    Hua, Yang
    Song, Tao
    Zhang, Zongpu
    Xue, Zhengui
    Ma, Ruhui
    Robertson, Neil
    Guan, Haibing
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6677 - 6686
  • [42] Design Principles of Permanent Magnet Dual-Memory Machines
    Li, Fuhua
    Chau, K. T.
    Liu, Chunhua
    Zhang, Zhen
    IEEE TRANSACTIONS ON MAGNETICS, 2012, 48 (11) : 3234 - 3237
  • [43] A dual-memory architecture for reinforcement learning on neuromorphic platforms
    Olin-Ammentorp, Wilkie
    Sokolov, Yury
    Bazhenov, Maxim
    NEUROMORPHIC COMPUTING AND ENGINEERING, 2021, 1 (02):
  • [44] Motion cues guided feature aggregation and enhancement for video object segmentation
    Li, Xuejun
    Zheng, Wenming
    Zong, Yuan
    NEUROCOMPUTING, 2022, 493 : 176 - 190
  • [45] Video object segmentation via couple streams and feature memory
    Liang, Yun
    Xiao, Xinjie
    Qiu, Shaojian
    Zhang, Yuqing
    Su, Zhuo
    IET IMAGE PROCESSING, 2024, 18 (09) : 2257 - 2272
  • [46] FFAVOD: Feature fusion architecture for video object detection
    Perreault, Hughes
    Bilodeau, Guillaume-Alexandre
    Saunier, Nicolas
    Heritier, Maguelonne
    PATTERN RECOGNITION LETTERS, 2021, 151 : 294 - 301
  • [47] Multiple feature temporal models for object detection in video
    Sánchez, JM
    Binefa, X
    Kender, JR
    IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : 433 - 436
  • [48] Sparse Embedded Convolution Based Dual Feature Aggregation 3D Object Detection Network
    Li, Hai-Sheng
    Lu, Yan-Ling
    NEURAL PROCESSING LETTERS, 2024, 56 (01)
  • [49] MFDANet: Multi-Scale Feature Dual-Stream Aggregation Network for Salient Object Detection
    Ge, Bin
    Pei, Jiajia
    Xia, Chenxing
    Wu, Taolin
    ELECTRONICS, 2023, 12 (13)
  • [50] Sparse Embedded Convolution Based Dual Feature Aggregation 3D Object Detection Network
    Hai-Sheng Li
    Yan-Ling Lu
    Neural Processing Letters, 56