MWVOS: Mask-Free Weakly Supervised Video Object Segmentation via promptable foundation model

被引：0

作者：

Zhang, Zhenghao ^{[1
]}

Zhang, Shengfan ^{[1
]}

Dai, Zuozhuo ^{[1
]}

Dong, Zilong ^{[1
]}

Zhu, Siyu ^{[2
]}

机构：

[1] Alibaba Grp, Hangzhou 310030, Peoples R China

[2] Fudan Univ, Shanghai 200433, Peoples R China

来源：

PATTERN RECOGNITION | 2025年 / 159卷

关键词：

Vision foundation model; Video instance segmentation; Deep learning;

D O I：

10.1016/j.patcog.2024.111100

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The current state-of-the-art techniques for video object segmentation necessitate extensive training on video datasets with mask annotations, thereby constraining their ability to transfer zero-shot learning to new image distributions and tasks. However, recent advancements in foundation models, particularly in the domain of image segmentation, have showcased robust generalization capabilities, introducing a novel prompt-driven paradigm fora variety of downstream segmentation challenges on new data distributions. This study delves into the potential of vision foundation models using diverse prompt strategies and proposes a mask-free approach for unsupervised video object segmentation. To further improve the efficacy of prompt learning in diverse and complex video scenes, we introduce a spatial-temporal decoupled deformable attention mechanism to establish an effective correlation between intra- and inter-frame features. Extensive experiments conducted on the DAVIS2017-unsupervised and YoutubeVIS19&21 and OIVS datasets demonstrate the superior performance of the proposed approach without mask supervision when compared to existing mask-supervised methods, as well as its capacity to generalize to weakly-annotated video datasets.

引用

页数：12

共 50 条

[21] Scribble-Supervised Video Object Segmentation via Scribble Enhancement
Gao, Xingyu
Li, Zuolei
Shi, Hailong
Chen, Zhenyu
Zhao, Peilin
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (04) : 2999 - 3012
[22] Weakly-Supervised Video Object Grounding via Stable Context Learning
Wang, Wei
Gao, Junyu
Xu, Changsheng
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 760 - 768
[23] Learning with Noise: Mask-Guided Attention Model for Weakly Supervised Nuclei Segmentation
Guo, Ruoyu
Pagnucco, Maurice
Song, Yang
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT II, 2021, 12902 : 461 - 470
[24] A progressive segmentation with weight contrast label enhancement for weakly supervised video salient object detection
Lu, Zelin
Liang, Haoran
Xu, Binwei
Liang, Ronghua
IET IMAGE PROCESSING, 2023, 17 (10) : 2925 - 2936
[25] Query-Memory Re-Aggregation for Weakly-supervised Video Object Segmentation
Lin, Fanchao
Xie, Hongtao
Li, Yan
Zhang, Yongdong
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 2038 - 2046
[26] Self-supervised video object segmentation via pseudo label rectification
Guo, Pinxue
Zhang, Wei
Li, Xiaoqiang
Fan, Jianping
Zhang, Wenqiang
PATTERN RECOGNITION, 2025, 163
[27] Distance-Guided Mask Propagation Model for Efficient Video Object Segmentation
Liu, Jiajia
Dai, Hongning
Li, Bo
Tang, Gaozhong
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[28] Weakly Supervised Few-Shot Semantic Segmentation via Pseudo Mask Enhancement and Meta Learning
Zhang, Man
Zhou, Yong
Liu, Bing
Zhao, Jiaqi
Yao, Rui
Shao, Zhiwen
Zhu, Hancheng
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7980 - 7991
[29] Multidimensional Exploration of Segment Anything Model for Weakly Supervised Video Salient Object Detection
Xu, Binwei
Jiang, Qiuping
Zhao, Xing
Lu, Chenyang
Liang, Haoran
Liang, Ronghua
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (04) : 2987 - 2998
[30] Learning robust correlation with foundation model for weakly-supervised few-shot segmentation
Huang, Xinyang
Zhu, Chuang
Liu, Kebin
Ren, Ruiying
Liu, Shengjie
KNOWLEDGE-BASED SYSTEMS, 2024, 299

← 1 2 3 4 5 →