Efficient Semisupervised Object Segmentation for Long-Term Videos Using Adaptive Memory Network

被引:0
|
作者
Zhong, Shan [1 ,2 ,3 ]
Li, Guoqiang [2 ]
Ying, Wenhao [1 ]
Zhao, Fuzhou [4 ]
Xie, Gengsheng [5 ]
Gong, Shengrong [1 ,2 ,3 ]
机构
[1] Changshu Inst Technol, Sch Comp Sci & Engn, Changshu 215500, Jiangsu, Peoples R China
[2] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215000, Jiangsu, Peoples R China
[3] Jilin Univ, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun 130000, Peoples R China
[4] Changshu Inst Technol, Sch Automot Engn, Suzhou 215000, Jiangsu, Peoples R China
[5] Jiangxi Normal Univ, Sch Software, Nanchang 330022, Jiangxi, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Feature extraction; Videos; Object recognition; Data mining; Adaptation models; Adaptive systems; Video sequences; Long-term videos; memory network; object segmentation; semisupervised learning;
D O I
10.1109/TCDS.2024.3385849
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video object segmentation (VOS) uses the first annotated video mask to achieve consistent and precise segmentation in subsequent frames. Recently, memory-based methods have received significant attention owing to their substantial performance enhancements. However, these approaches rely on a fixed global memory strategy, which poses a challenge to segmentation accuracy and speed in the context of longer videos. To alleviate this limitation, we propose a novel semisupervised VOS model, founded on the principles of the adaptive memory network. Our proposed model adaptively extracts object features by focusing on the object area while effectively filtering out extraneous background noise. An identification mechanism is also thoughtfully applied to discern each object in multiobject scenarios. To further reduce storage consumption without compromising the saliency of object information, the outdated features residing in the memory pool are compressed into salient features through the employment of a self-attention mechanism. Furthermore, we introduce a local matching module, specifically devised to refine object features by fusing the contextual information from historical frames. We demonstrate the efficiency of our approach through experiments, substantially augmenting both the speed and precision of segmentation for long-term videos, while maintaining comparable performance for short videos.
引用
收藏
页码:1789 / 1802
页数:14
相关论文
共 50 条
  • [1] Memory-Efficient Continual Learning Object Segmentation for Long Videos
    Nazemi, Amir
    Shafiee, Mohammad Javad
    Gharaee, Zahra
    Fieguth, Paul
    IEEE ACCESS, 2024, 12 : 97067 - 97084
  • [2] LTST: Long-term segmentation tracker with memory attention network
    Yu, Lang
    Qiao, Baojun
    Zhang, Huanlong
    Yu, Junyang
    He, Xin
    Image and Vision Computing, 2022, 119
  • [3] LTST: Long-term segmentation tracker with memory attention network
    Yu, Lang
    Qiao, Baojun
    Zhang, Huanlong
    Yu, Junyang
    He, Xin
    IMAGE AND VISION COMPUTING, 2022, 119
  • [4] Robust and Efficient Memory Network for Video Object Segmentation
    Chen, Yadang
    Zhang, Dingwei
    Yang, Zhi-Xin
    Wu, Enhua
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1769 - 1774
  • [5] Efficient Regional Memory Network for Video Object Segmentation
    Xie, Haozhe
    Yao, Hongxun
    Zhou, Shangchen
    Zhang, Shengping
    Sun, Wenxiu
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1286 - 1295
  • [6] Adaptive Correlation Filters with Long-Term and Short-Term Memory for Object Tracking
    Chao Ma
    Jia-Bin Huang
    Xiaokang Yang
    Ming-Hsuan Yang
    International Journal of Computer Vision, 2018, 126 : 771 - 796
  • [7] Adaptive Correlation Filters with Long-Term and Short-Term Memory for Object Tracking
    Ma, Chao
    Huang, Jia-Bin
    Yang, Xiaokang
    Yang, Ming-Hsuan
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2018, 126 (08) : 771 - 796
  • [8] XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
    Cheng, Ho Kei
    Schwing, Alexander G.
    COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 640 - 658
  • [9] Dual Temporal Memory Network for Efficient Video Object Segmentation
    Zhang, Kaihua
    Wang, Long
    Liu, Dong
    Liu, Bo
    Liu, Qingshan
    Li, Zhu
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1515 - 1523
  • [10] The hippocampus and long-term object memory in the rat
    Vnek, N
    Rothblat, LA
    JOURNAL OF NEUROSCIENCE, 1996, 16 (08): : 2780 - 2787