Dual-Memory Feature Aggregation for Video Object Detection

被引:0
|
作者
Fan, Diwei [1 ,2 ,3 ]
Zheng, Huicheng [1 ,2 ,3 ]
Dang, Jisheng [1 ,2 ,3 ]
机构
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou, Peoples R China
[2] Minist Educ, Key Lab Machine Intelligence & Adv Comp, Guangzhou, Peoples R China
[3] Guangdong Prov Key Lab Informat Secur Technol, Guangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
video object detection; feature aggregation; temporal information; global memory; local feature cache;
D O I
10.1007/978-981-99-8537-1_18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent studies on video object detection have shown the advantages of aggregating features across frames to capture temporal information, which can mitigate appearance degradation, such as occlusion, motion blur, and defocus. However, these methods often employ a sliding window or memory queue to store temporal information frame by frame, leading to discarding features of earlier frames over time. To address this, we propose a dual-memory feature aggregation framework (DMFA). DMFA simultaneously constructs a local feature cache and a global feature memory in a feature-wise updating way at different granularities, i.e., pixel level and proposal level. This approach can partially preserve key features across frames. The local feature cache stores the spatio-temporal contexts from nearby frames to boost the localization capacity, while the global feature memory enhances semantic feature representation by capturing temporal information from all previous frames. Moreover, we introduce contrastive learning to improve the discriminability of temporal features, resulting in more accurate proposal-level feature aggregation. Extensive experiments demonstrate that our method achieves state-of-the-art performance on the ImageNet VID benchmark.
引用
收藏
页码:220 / 232
页数:13
相关论文
共 50 条
  • [1] DUALFEAT: DUAL FEATURE AGGREGATION FOR VIDEO OBJECT DETECTION
    Pan, Jing
    Du, Kaiwen
    Yan, Yan
    Wang, Hanzi
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2901 - 2905
  • [2] Adaptive Feature Aggregation for Video Object Detection
    Qian, Yijun
    Yu, Lijun
    Liu, Wenhe
    Kang, Guoliang
    Hauptmann, Alexander G.
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW), 2020, : 143 - 147
  • [3] Exploiting Better Feature Aggregation for Video Object Detection
    Han, Liang
    Wang, Pichao
    Yin, Zhaozheng
    Wang, Fan
    Li, Hao
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1469 - 1477
  • [4] MIDFA: Memory-Based Instance Division and Feature Aggregation Network for Video Object Detection
    Chen, Qiaochuan
    Zhou, Min
    Yu, Hang
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2023, PT III, 2023, 13937 : 153 - 164
  • [5] Video Object Detection Using Motion Context and Feature Aggregation
    Kim, Jaekyum
    Koh, Junho
    Choi, Jun Won
    11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 269 - 272
  • [6] Flow-Guided Feature Aggregation for Video Object Detection
    Zhu, Xizhou
    Wang, Yujie
    Dai, Jifeng
    Yuan, Lu
    Wei, Yichen
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 408 - 417
  • [7] Temporal Context Enhanced Feature Aggregation for Video Object Detection
    He, Fei
    Gao, Naiyu
    Li, Qiaozhe
    Du, Senyao
    Zhao, Xin
    Huang, Kaiqi
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 10941 - 10948
  • [8] GUIDED SAMPLING BASED FEATURE AGGREGATION FOR VIDEO OBJECT DETECTION
    Liang, Jun
    Chen, Haosheng
    Yan, Yan
    Lu, Yang
    Wang, Hanzi
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1116 - 1120
  • [9] Temporal-adaptive sparse feature aggregation for video object detection
    He, Fei
    Li, Qiaozhe
    Zhao, Xin
    Huang, Kaiqi
    PATTERN RECOGNITION, 2022, 127
  • [10] Attention-Guided Disentangled Feature Aggregation for Video Object Detection
    Muralidhara, Shishir
    Hashmi, Khurram Azeem
    Pagani, Alain
    Liwicki, Marcus
    Stricker, Didier
    Afzal, Muhammad Zeshan
    SENSORS, 2022, 22 (21)