Vanishing mask refinement in semi-supervised video object segmentation

被引：0

作者：

Pita, Javier ^{[1
]}

Llerena, Juan P. ^{[2
,3
]}

Patricio, Miguel A. ^{[3
]}

Berlanga, Antonio ^{[3
]}

Usero, Luis ^{[2
]}

机构：

[1] Grp MasMovil MasMovil Team, Ave Bruselas 38, Madrid 28108, Spain

[2] Univ Alcala, Cognit Sci Res Grp, Ctra Madrid Barcelona km, Campus Univ,Ctra Madrid Barcelona km,33,600, Madrid 28805, Spain

[3] Univ Carlos III Madrid, Comp Sci & Engn Dept, Appl Artificial Intelligence Grp, Avda Gregorio Peces-Barba & Martinez,22,Colmenarej, Madrid 28270, Spain

来源：

APPLIED SOFT COMPUTING | 2025年 / 172卷

关键词：

Video Object Segmentation; Foundation model; Object Segmentation; Long-term videos; Deep learning; FRAMEWORK;

D O I：

10.1016/j.asoc.2025.112837

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents Video Object Segmentation Enhanced with Segment Anything Model (VOS-E-SAM), a multistage architecture for Semi-supervised Video Object Segmentation (SVOS) using the foundational Segment Anything Model (SAM) architecture, aimed at addressing the challenges of mask degradation over time in long video sequences. Our architectural approach enhances the object masks produced by the XMem model by incorporating SAM. This integration uses various input combinations and low-level computer vision techniques to generate point prompts, in order to improve mask continuity and accuracy throughout the entire video cycle. The main challenge addressed is the fading or vanishing of object masks in long video sequences due to problems such as changes in object appearance, occlusions, camera movements, and approach changes. Both the baseline architecture and the newer high-quality version are tested, addressing the primary challenge of object mask fading or vanishing in long video sequences due to changes in object appearance, occlusions, camera movements, and variations in approach. Through rigorous experimentation with different prompt configurations, we identified an outstanding configuration of SAM inputs to improve mask refinement. Evaluations on benchmark long video datasets, such as LongDataset and LVOS, show that our approach significantly improves mask quality in single-object extended video sequences proven by percentage increments on jaccard index (J ) and contour accuracy (F) based metrics (mean, recall and decay). Our results show remarkable improvements in mask persistence and accuracy, which sets a new standard for the integration of foundational models in video segmentation and lays the foundation for future research in this field. Github.VOS-E-SAM

引用

页数：14

共 50 条

[1] Semi-Supervised Video Object Segmentation with Super-Trajectories
Wang, Wenguan
Shen, Jianbing
Porikli, Fatih
Yang, Ruigang
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (04) : 985 - 998
[2] Spatial constraint for efficient semi-supervised video object segmentation
Chen, Yadang
Ji, Chuanjun
Yang, Zhi-Xin
Wu, Enhua
COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 237
[3] Semi-supervised Video Object Segmentation with Recurrent Neural Network
Ren, Xuanguang
Pan, Han
Jing, Zhongliang
Gao, Lei
CONFERENCE PROCEEDINGS OF 2019 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (IEEE ICSPCC 2019), 2019,
[4] Separable Structure Modeling for Semi-Supervised Video Object Segmentation
Zhu, Wencheng
Li, Jiahao
Lu, Jiwen
Zhou, Jie
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (01) : 330 - 344
[5] Learning Object Deformation and Motion Adaption for Semi-supervised Video Object Segmentation
Zheng, Xiaoyang
Tan, Xin
Guo, Jianming
Ma, Lizhuang
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 8655 - 8662
[6] Spatio-temporal compression for semi-supervised video object segmentation
Ji, Chuanjun
Chen, Yadang
Yang, Zhi-Xin
Wu, Enhua
VISUAL COMPUTER, 2023, 39 (10): : 4929 - 4942
[7] CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing
Duarte, Kevin
Rawat, Yogesh S.
Shah, Mubarak
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8479 - 8488
[8] SiamPolar: Semi-supervised realtime video object segmentation with polar representation
Li, Yaochen
Hong, Yuhui
Song, Yonghong
Zhu, Chao
Zhang, Ying
Wang, Ruihao
NEUROCOMPUTING, 2022, 467 : 491 - 503
[9] Semi-supervised Video Object Segmentation Using Parallel Coattention Network
Chakraborty, Sangramjit
Mahapatra, Monalisha
Nandy, Anup
PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2023, 2023, 14301 : 449 - 456
[10] Spatio-temporal compression for semi-supervised video object segmentation
Chuanjun Ji
Yadang Chen
Zhi-Xin Yang
Enhua Wu
The Visual Computer, 2023, 39 : 4929 - 4942

← 1 2 3 4 5 →