Vanishing mask refinement in semi-supervised video object segmentation

被引:0
|
作者
Pita, Javier [1 ]
Llerena, Juan P. [2 ,3 ]
Patricio, Miguel A. [3 ]
Berlanga, Antonio [3 ]
Usero, Luis [2 ]
机构
[1] Grp MasMovil MasMovil Team, Ave Bruselas 38, Madrid 28108, Spain
[2] Univ Alcala, Cognit Sci Res Grp, Ctra Madrid Barcelona km, Campus Univ,Ctra Madrid Barcelona km,33,600, Madrid 28805, Spain
[3] Univ Carlos III Madrid, Comp Sci & Engn Dept, Appl Artificial Intelligence Grp, Avda Gregorio Peces-Barba & Martinez,22,Colmenarej, Madrid 28270, Spain
关键词
Video Object Segmentation; Foundation model; Object Segmentation; Long-term videos; Deep learning; FRAMEWORK;
D O I
10.1016/j.asoc.2025.112837
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents Video Object Segmentation Enhanced with Segment Anything Model (VOS-E-SAM), a multistage architecture for Semi-supervised Video Object Segmentation (SVOS) using the foundational Segment Anything Model (SAM) architecture, aimed at addressing the challenges of mask degradation over time in long video sequences. Our architectural approach enhances the object masks produced by the XMem model by incorporating SAM. This integration uses various input combinations and low-level computer vision techniques to generate point prompts, in order to improve mask continuity and accuracy throughout the entire video cycle. The main challenge addressed is the fading or vanishing of object masks in long video sequences due to problems such as changes in object appearance, occlusions, camera movements, and approach changes. Both the baseline architecture and the newer high-quality version are tested, addressing the primary challenge of object mask fading or vanishing in long video sequences due to changes in object appearance, occlusions, camera movements, and variations in approach. Through rigorous experimentation with different prompt configurations, we identified an outstanding configuration of SAM inputs to improve mask refinement. Evaluations on benchmark long video datasets, such as LongDataset and LVOS, show that our approach significantly improves mask quality in single-object extended video sequences proven by percentage increments on jaccard index (J ) and contour accuracy (F) based metrics (mean, recall and decay). Our results show remarkable improvements in mask persistence and accuracy, which sets a new standard for the integration of foundational models in video segmentation and lays the foundation for future research in this field. Github.VOS-E-SAM
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Semi-Supervised Video Object Segmentation with Super-Trajectories
    Wang, Wenguan
    Shen, Jianbing
    Porikli, Fatih
    Yang, Ruigang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (04) : 985 - 998
  • [2] Spatial constraint for efficient semi-supervised video object segmentation
    Chen, Yadang
    Ji, Chuanjun
    Yang, Zhi-Xin
    Wu, Enhua
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 237
  • [3] Semi-supervised Video Object Segmentation with Recurrent Neural Network
    Ren, Xuanguang
    Pan, Han
    Jing, Zhongliang
    Gao, Lei
    CONFERENCE PROCEEDINGS OF 2019 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (IEEE ICSPCC 2019), 2019,
  • [4] Separable Structure Modeling for Semi-Supervised Video Object Segmentation
    Zhu, Wencheng
    Li, Jiahao
    Lu, Jiwen
    Zhou, Jie
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (01) : 330 - 344
  • [5] Learning Object Deformation and Motion Adaption for Semi-supervised Video Object Segmentation
    Zheng, Xiaoyang
    Tan, Xin
    Guo, Jianming
    Ma, Lizhuang
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 8655 - 8662
  • [6] Spatio-temporal compression for semi-supervised video object segmentation
    Ji, Chuanjun
    Chen, Yadang
    Yang, Zhi-Xin
    Wu, Enhua
    VISUAL COMPUTER, 2023, 39 (10): : 4929 - 4942
  • [7] CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing
    Duarte, Kevin
    Rawat, Yogesh S.
    Shah, Mubarak
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8479 - 8488
  • [8] SiamPolar: Semi-supervised realtime video object segmentation with polar representation
    Li, Yaochen
    Hong, Yuhui
    Song, Yonghong
    Zhu, Chao
    Zhang, Ying
    Wang, Ruihao
    NEUROCOMPUTING, 2022, 467 : 491 - 503
  • [9] Semi-supervised Video Object Segmentation Using Parallel Coattention Network
    Chakraborty, Sangramjit
    Mahapatra, Monalisha
    Nandy, Anup
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2023, 2023, 14301 : 449 - 456
  • [10] Spatio-temporal compression for semi-supervised video object segmentation
    Chuanjun Ji
    Yadang Chen
    Zhi-Xin Yang
    Enhua Wu
    The Visual Computer, 2023, 39 : 4929 - 4942