MWVOS: Mask-Free Weakly Supervised Video Object Segmentation via promptable foundation model

被引:0
|
作者
Zhang, Zhenghao [1 ]
Zhang, Shengfan [1 ]
Dai, Zuozhuo [1 ]
Dong, Zilong [1 ]
Zhu, Siyu [2 ]
机构
[1] Alibaba Grp, Hangzhou 310030, Peoples R China
[2] Fudan Univ, Shanghai 200433, Peoples R China
关键词
Vision foundation model; Video instance segmentation; Deep learning;
D O I
10.1016/j.patcog.2024.111100
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The current state-of-the-art techniques for video object segmentation necessitate extensive training on video datasets with mask annotations, thereby constraining their ability to transfer zero-shot learning to new image distributions and tasks. However, recent advancements in foundation models, particularly in the domain of image segmentation, have showcased robust generalization capabilities, introducing a novel prompt-driven paradigm fora variety of downstream segmentation challenges on new data distributions. This study delves into the potential of vision foundation models using diverse prompt strategies and proposes a mask-free approach for unsupervised video object segmentation. To further improve the efficacy of prompt learning in diverse and complex video scenes, we introduce a spatial-temporal decoupled deformable attention mechanism to establish an effective correlation between intra- and inter-frame features. Extensive experiments conducted on the DAVIS2017-unsupervised and YoutubeVIS19&21 and OIVS datasets demonstrate the superior performance of the proposed approach without mask supervision when compared to existing mask-supervised methods, as well as its capacity to generalize to weakly-annotated video datasets.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Scribble-Supervised Video Object Segmentation via Scribble Enhancement
    Gao, Xingyu
    Li, Zuolei
    Shi, Hailong
    Chen, Zhenyu
    Zhao, Peilin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (04) : 2999 - 3012
  • [22] Weakly-Supervised Video Object Grounding via Stable Context Learning
    Wang, Wei
    Gao, Junyu
    Xu, Changsheng
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 760 - 768
  • [23] Learning with Noise: Mask-Guided Attention Model for Weakly Supervised Nuclei Segmentation
    Guo, Ruoyu
    Pagnucco, Maurice
    Song, Yang
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT II, 2021, 12902 : 461 - 470
  • [24] A progressive segmentation with weight contrast label enhancement for weakly supervised video salient object detection
    Lu, Zelin
    Liang, Haoran
    Xu, Binwei
    Liang, Ronghua
    IET IMAGE PROCESSING, 2023, 17 (10) : 2925 - 2936
  • [25] Query-Memory Re-Aggregation for Weakly-supervised Video Object Segmentation
    Lin, Fanchao
    Xie, Hongtao
    Li, Yan
    Zhang, Yongdong
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 2038 - 2046
  • [26] Self-supervised video object segmentation via pseudo label rectification
    Guo, Pinxue
    Zhang, Wei
    Li, Xiaoqiang
    Fan, Jianping
    Zhang, Wenqiang
    PATTERN RECOGNITION, 2025, 163
  • [27] Distance-Guided Mask Propagation Model for Efficient Video Object Segmentation
    Liu, Jiajia
    Dai, Hongning
    Li, Bo
    Tang, Gaozhong
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [28] Weakly Supervised Few-Shot Semantic Segmentation via Pseudo Mask Enhancement and Meta Learning
    Zhang, Man
    Zhou, Yong
    Liu, Bing
    Zhao, Jiaqi
    Yao, Rui
    Shao, Zhiwen
    Zhu, Hancheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7980 - 7991
  • [29] Multidimensional Exploration of Segment Anything Model for Weakly Supervised Video Salient Object Detection
    Xu, Binwei
    Jiang, Qiuping
    Zhao, Xing
    Lu, Chenyang
    Liang, Haoran
    Liang, Ronghua
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (04) : 2987 - 2998
  • [30] Learning robust correlation with foundation model for weakly-supervised few-shot segmentation
    Huang, Xinyang
    Zhu, Chuang
    Liu, Kebin
    Ren, Ruiying
    Liu, Shengjie
    KNOWLEDGE-BASED SYSTEMS, 2024, 299