XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model

被引：118

作者：

Cheng, Ho Kei ^{[1
]}

Schwing, Alexander G. ^{[1
]}

机构：

[1] Univ Illinois, Champaign, IL 61820 USA

来源：

COMPUTER VISION - ECCV 2022, PT XXVIII | 2022年 / 13688卷

关键词：

D O I：

10.1007/978-3-031-19815-1_37

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present XMem, a video object segmentation architecture for long videos with unified feature memory stores inspired by the Atkinson-Shiffrin memory model. Prior work on video object segmentation typically only uses one type of feature memory. For videos longer than a minute, a single feature memory model tightly links memory consumption and accuracy. In contrast, following the Atkinson-Shiffrin model, we develop an architecture that incorporates multiple independent yet deeply-connected feature memory stores: a rapidly updated sensory memory, a high-resolution working memory, and a compact thus sustained long-term memory. Crucially, we develop a memory potentiation algorithm that routinely consolidates actively used working memory elements into the long-term memory, which avoids memory explosion and minimizes performance decay for long-term prediction. Combined with a new memory reading mechanism, XMem greatly exceeds state-of-the-art performance on long-video datasets while being on par with state-of-the-art methods (that do not work on long videos) on short-video datasets.

引用

页码：640 / 658

页数：19

共 50 条

[1] On human memory: Evolution, progress, and reflections on the 30th anniversary of the Atkinson-Shiffrin model
Cowan, N
Rouder, JN
Stadler, MA
AMERICAN JOURNAL OF PSYCHOLOGY, 2000, 113 (04): : 639 - 648
[2] On human memory: Evolution, progress, and reflections on the 30th anniversary of the Atkinson-Shiffrin model.
Hockley, WE
JOURNAL OF MATHEMATICAL PSYCHOLOGY, 2000, 44 (02) : 336 - 345
[3] A Social Group Chatbot System by Multiple Topics Tracking and Atkinson-Shiffrin Memory Model Using AI Agents Collaboration
Zhang, Guoshuai
Wu, Jiaji
Jeon, Gwanggil
Wang, Penghui
EXPERT SYSTEMS, 2025, 42 (02)
[4] Black-box Unsupervised Domain Adaptation with Bi-directional Atkinson-Shiffrin Memory
Zhang, Jingyi
Huang, Jiaxing
Jiang, Xueying
Lu, Shijian
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11737 - 11748
[5] LVOS: A Benchmark for Long-term Video Object Segmentation
Hong, Lingyi
Chen, Wenchao
Liu, Zhongying
Zhang, Wei
Guo, Pinxue
Chen, Zhaoyu
Zhang, Wenqiang
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13434 - 13446
[6] Atkinson and Shiffrin ' s (1968) influential model overshadowed their contemporary theory of human memory
Wixted, John T.
JOURNAL OF MEMORY AND LANGUAGE, 2024, 136
[7] Efficient Semisupervised Object Segmentation for Long-Term Videos Using Adaptive Memory Network
Zhong, Shan
Li, Guoqiang
Ying, Wenhao
Zhao, Fuzhou
Xie, Gengsheng
Gong, Shengrong
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (05) : 1789 - 1802
[8] Attention-Guided Memory Model for Video Object Segmentation
Lin, Yunjian
Tan, Yihua
Communications in Computer and Information Science, 2022, 1566 CCIS : 67 - 85
[9] The hippocampus and long-term object memory in the rat
Vnek, N
Rothblat, LA
JOURNAL OF NEUROSCIENCE, 1996, 16 (08): : 2780 - 2787
[10] Strike the Balance: On-the-Fly Uncertainty based User Interactions for Long-Term Video Object Segmentation
Fraunhofer IOSB, Ettlingen, Germany
不详
arXiv,

← 1 2 3 4 5 →