MWVOS: Mask-Free Weakly Supervised Video Object Segmentation via promptable foundation model

被引：0

作者：

Zhang, Zhenghao ^{[1
]}

Zhang, Shengfan ^{[1
]}

Dai, Zuozhuo ^{[1
]}

Dong, Zilong ^{[1
]}

Zhu, Siyu ^{[2
]}

机构：

[1] Alibaba Grp, Hangzhou 310030, Peoples R China

[2] Fudan Univ, Shanghai 200433, Peoples R China

来源：

PATTERN RECOGNITION | 2025年 / 159卷

关键词：

Vision foundation model; Video instance segmentation; Deep learning;

D O I：

10.1016/j.patcog.2024.111100

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The current state-of-the-art techniques for video object segmentation necessitate extensive training on video datasets with mask annotations, thereby constraining their ability to transfer zero-shot learning to new image distributions and tasks. However, recent advancements in foundation models, particularly in the domain of image segmentation, have showcased robust generalization capabilities, introducing a novel prompt-driven paradigm fora variety of downstream segmentation challenges on new data distributions. This study delves into the potential of vision foundation models using diverse prompt strategies and proposes a mask-free approach for unsupervised video object segmentation. To further improve the efficacy of prompt learning in diverse and complex video scenes, we introduce a spatial-temporal decoupled deformable attention mechanism to establish an effective correlation between intra- and inter-frame features. Extensive experiments conducted on the DAVIS2017-unsupervised and YoutubeVIS19&21 and OIVS datasets demonstrate the superior performance of the proposed approach without mask supervision when compared to existing mask-supervised methods, as well as its capacity to generalize to weakly-annotated video datasets.

引用

页数：12

共 50 条

[31] Weakly-Supervised Video Object Grounding via Learning Uni-Modal Associations
Wang, Wei
Gao, Junyu
Xu, Changsheng
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 6329 - 6340
[32] Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions
Li, Shuang
Du, Yilun
Torralba, Antonio
Sivic, Josef
Russell, Bryan
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1825 - 1835
[33] OWS-Seg: Online Weakly Supervised Video Instance Segmentation via Contrastive Learning
Ning, Yuanxiang
Li, Fei
Dong, Mengping
Li, Zhenbo
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VII, 2023, 14260 : 476 - 488
[34] FM-ABS: Promptable Foundation Model Drives Active Barely Supervised Learning for 3D Medical Image Segmentation
Xu, Zhe
Chen, Cheng
Lu, Donghuan
Sun, Jinghan
Wei, Dong
Zheng, Yefeng
Li, Quanzheng
Tong, Raymond Kai-yu
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT VIII, 2024, 15008 : 294 - 304
[35] The Devil is in the Points: Weakly Semi-Supervised Instance Segmentation via Point-Guided Mask Representation
Kim, Beomyoung
Jeong, Joonhyun
Han, Dongyoon
Hwang, Sung Ju
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11360 - 11370
[36] Weakly Supervised Volumetric Segmentation via Self-taught Shape Denoising Model
He, Qian
Li, Shuailin
He, Xuming
MEDICAL IMAGING WITH DEEP LEARNING, VOL 143, 2021, 143 : 268 - 285
[37] Semi-Supervised Video Object Segmentation via Learning Object-Aware Global-Local Correspondence
Fan, Jiaqing
Liu, Bo
Zhang, Kaihua
Liu, Qingshan
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (12) : 8153 - 8164
[38] Uncertainty Estimation via Response Scaling for Pseudo-Mask Noise Mitigation in Weakly-Supervised Semantic Segmentation
Li, Yi
Duan, Yiqun
Kuang, Zhanghui
Chen, Yimin
Zhang, Wayne
Li, Xiaomeng
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1447 - 1455
[39] Learning Saliency-Free Model with Generic Features for Weakly-Supervised Semantic Segmentation
Luo, Wenfeng
Yang, Meng
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11717 - 11724
[40] Semi-supervised Video Object Segmentation Via an Edge Attention Gated Graph Convolutional Network
Zhang, Yuqing
Zhang, Yong
Wang, Shaofan
Liang, Yun
Yin, Baocai
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (01)

← 1 2 3 4 5 →