Efficient Semisupervised Object Segmentation for Long-Term Videos Using Adaptive Memory Network

被引：0

作者：

Zhong, Shan ^{[1
,2
,3
]}

Li, Guoqiang ^{[2
]}

Ying, Wenhao ^{[1
]}

Zhao, Fuzhou ^{[4
]}

Xie, Gengsheng ^{[5
]}

Gong, Shengrong ^{[1
,2
,3
]}

机构：

[1] Changshu Inst Technol, Sch Comp Sci & Engn, Changshu 215500, Jiangsu, Peoples R China

[2] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215000, Jiangsu, Peoples R China

[3] Jilin Univ, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun 130000, Peoples R China

[4] Changshu Inst Technol, Sch Automot Engn, Suzhou 215000, Jiangsu, Peoples R China

[5] Jiangxi Normal Univ, Sch Software, Nanchang 330022, Jiangxi, Peoples R China

来源：

IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS | 2024年 / 16卷 / 05期

基金：

中国博士后科学基金; 中国国家自然科学基金;

关键词：

Feature extraction; Videos; Object recognition; Data mining; Adaptation models; Adaptive systems; Video sequences; Long-term videos; memory network; object segmentation; semisupervised learning;

D O I：

10.1109/TCDS.2024.3385849

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video object segmentation (VOS) uses the first annotated video mask to achieve consistent and precise segmentation in subsequent frames. Recently, memory-based methods have received significant attention owing to their substantial performance enhancements. However, these approaches rely on a fixed global memory strategy, which poses a challenge to segmentation accuracy and speed in the context of longer videos. To alleviate this limitation, we propose a novel semisupervised VOS model, founded on the principles of the adaptive memory network. Our proposed model adaptively extracts object features by focusing on the object area while effectively filtering out extraneous background noise. An identification mechanism is also thoughtfully applied to discern each object in multiobject scenarios. To further reduce storage consumption without compromising the saliency of object information, the outdated features residing in the memory pool are compressed into salient features through the employment of a self-attention mechanism. Furthermore, we introduce a local matching module, specifically devised to refine object features by fusing the contextual information from historical frames. We demonstrate the efficiency of our approach through experiments, substantially augmenting both the speed and precision of segmentation for long-term videos, while maintaining comparable performance for short videos.

引用

页码：1789 / 1802

页数：14

共 50 条

[1] Memory-Efficient Continual Learning Object Segmentation for Long Videos
Nazemi, Amir
Shafiee, Mohammad Javad
Gharaee, Zahra
Fieguth, Paul
IEEE ACCESS, 2024, 12 : 97067 - 97084
[2] LTST: Long-term segmentation tracker with memory attention network
Yu, Lang
Qiao, Baojun
Zhang, Huanlong
Yu, Junyang
He, Xin
Image and Vision Computing, 2022, 119
[3] LTST: Long-term segmentation tracker with memory attention network
Yu, Lang
Qiao, Baojun
Zhang, Huanlong
Yu, Junyang
He, Xin
IMAGE AND VISION COMPUTING, 2022, 119
[4] Robust and Efficient Memory Network for Video Object Segmentation
Chen, Yadang
Zhang, Dingwei
Yang, Zhi-Xin
Wu, Enhua
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1769 - 1774
[5] Efficient Regional Memory Network for Video Object Segmentation
Xie, Haozhe
Yao, Hongxun
Zhou, Shangchen
Zhang, Shengping
Sun, Wenxiu
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1286 - 1295
[6] Adaptive Correlation Filters with Long-Term and Short-Term Memory for Object Tracking
Chao Ma
Jia-Bin Huang
Xiaokang Yang
Ming-Hsuan Yang
International Journal of Computer Vision, 2018, 126 : 771 - 796
[7] Adaptive Correlation Filters with Long-Term and Short-Term Memory for Object Tracking
Ma, Chao
Huang, Jia-Bin
Yang, Xiaokang
Yang, Ming-Hsuan
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2018, 126 (08) : 771 - 796
[8] XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
Cheng, Ho Kei
Schwing, Alexander G.
COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 640 - 658
[9] Dual Temporal Memory Network for Efficient Video Object Segmentation
Zhang, Kaihua
Wang, Long
Liu, Dong
Liu, Bo
Liu, Qingshan
Li, Zhu
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1515 - 1523
[10] The hippocampus and long-term object memory in the rat
Vnek, N
Rothblat, LA
JOURNAL OF NEUROSCIENCE, 1996, 16 (08): : 2780 - 2787

← 1 2 3 4 5 →