Vanishing mask refinement in semi-supervised video object segmentation

被引:0
|
作者
Pita, Javier [1 ]
Llerena, Juan P. [2 ,3 ]
Patricio, Miguel A. [3 ]
Berlanga, Antonio [3 ]
Usero, Luis [2 ]
机构
[1] Grp MasMovil MasMovil Team, Ave Bruselas 38, Madrid 28108, Spain
[2] Univ Alcala, Cognit Sci Res Grp, Ctra Madrid Barcelona km, Campus Univ,Ctra Madrid Barcelona km,33,600, Madrid 28805, Spain
[3] Univ Carlos III Madrid, Comp Sci & Engn Dept, Appl Artificial Intelligence Grp, Avda Gregorio Peces-Barba & Martinez,22,Colmenarej, Madrid 28270, Spain
关键词
Video Object Segmentation; Foundation model; Object Segmentation; Long-term videos; Deep learning; FRAMEWORK;
D O I
10.1016/j.asoc.2025.112837
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents Video Object Segmentation Enhanced with Segment Anything Model (VOS-E-SAM), a multistage architecture for Semi-supervised Video Object Segmentation (SVOS) using the foundational Segment Anything Model (SAM) architecture, aimed at addressing the challenges of mask degradation over time in long video sequences. Our architectural approach enhances the object masks produced by the XMem model by incorporating SAM. This integration uses various input combinations and low-level computer vision techniques to generate point prompts, in order to improve mask continuity and accuracy throughout the entire video cycle. The main challenge addressed is the fading or vanishing of object masks in long video sequences due to problems such as changes in object appearance, occlusions, camera movements, and approach changes. Both the baseline architecture and the newer high-quality version are tested, addressing the primary challenge of object mask fading or vanishing in long video sequences due to changes in object appearance, occlusions, camera movements, and variations in approach. Through rigorous experimentation with different prompt configurations, we identified an outstanding configuration of SAM inputs to improve mask refinement. Evaluations on benchmark long video datasets, such as LongDataset and LVOS, show that our approach significantly improves mask quality in single-object extended video sequences proven by percentage increments on jaccard index (J ) and contour accuracy (F) based metrics (mean, recall and decay). Our results show remarkable improvements in mask persistence and accuracy, which sets a new standard for the integration of foundational models in video segmentation and lays the foundation for future research in this field. Github.VOS-E-SAM
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Semi-supervised one-shot learning for video object segmentation in dynamic environments
    Dinesh Elayaperumal
    Sachin Sakthi K S
    Jae Hoon Jeong
    Young Hoon Joo
    Multimedia Tools and Applications, 2025, 84 (6) : 3095 - 3115
  • [22] Semi-supervised spatial-temporal calibration and semantic refinement network for video polyp segmentation
    Li, Feng
    Huang, Zetao
    Zhou, Lu
    Peng, Haixia
    Chu, Yimin
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 100
  • [23] Subdivided Mask Dispersion Framework for semi-supervised semantic segmentation
    Wang, Yooseung
    Jang, Jaehyuk
    Kim, Changick
    PATTERN RECOGNITION LETTERS, 2024, 179 : 58 - 64
  • [24] Semi-supervised statistical region refinement for color image segmentation
    Nock, R
    Nielsen, F
    PATTERN RECOGNITION, 2005, 38 (06) : 835 - 846
  • [25] PICK: Predict and Mask for Semi-supervised Medical Image Segmentation
    Zeng, Qingjie
    Lu, Zilin
    Xie, Yutong
    Xia, Yong
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025,
  • [26] Contextual Guided Segmentation Framework for Semi-supervised Video Instance Segmentation
    Le, Trung-Nghia
    Nguyen, Tam, V
    Tran, Minh-Triet
    MACHINE VISION AND APPLICATIONS, 2022, 33 (02)
  • [27] Contextual Guided Segmentation Framework for Semi-supervised Video Instance Segmentation
    Trung-Nghia Le
    Tam V. Nguyen
    Minh-Triet Tran
    Machine Vision and Applications, 2022, 33
  • [28] Semi-supervised Video Object Segmentation Via an Edge Attention Gated Graph Convolutional Network
    Zhang, Yuqing
    Zhang, Yong
    Wang, Shaofan
    Liang, Yun
    Yin, Baocai
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (01)
  • [29] Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised Video Object Segmentation
    Park, Hyojin
    Yoo, Jayeon
    Jeong, Seohyeong
    Venkatesh, Ganesh
    Kwak, Nojun
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 8401 - 8410
  • [30] Semi-Supervised Video Object Segmentation via Learning Object-Aware Global-Local Correspondence
    Fan, Jiaqing
    Liu, Bo
    Zhang, Kaihua
    Liu, Qingshan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (12) : 8153 - 8164