MWVOS: Mask-Free Weakly Supervised Video Object Segmentation via promptable foundation model

被引:0
|
作者
Zhang, Zhenghao [1 ]
Zhang, Shengfan [1 ]
Dai, Zuozhuo [1 ]
Dong, Zilong [1 ]
Zhu, Siyu [2 ]
机构
[1] Alibaba Grp, Hangzhou 310030, Peoples R China
[2] Fudan Univ, Shanghai 200433, Peoples R China
关键词
Vision foundation model; Video instance segmentation; Deep learning;
D O I
10.1016/j.patcog.2024.111100
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The current state-of-the-art techniques for video object segmentation necessitate extensive training on video datasets with mask annotations, thereby constraining their ability to transfer zero-shot learning to new image distributions and tasks. However, recent advancements in foundation models, particularly in the domain of image segmentation, have showcased robust generalization capabilities, introducing a novel prompt-driven paradigm fora variety of downstream segmentation challenges on new data distributions. This study delves into the potential of vision foundation models using diverse prompt strategies and proposes a mask-free approach for unsupervised video object segmentation. To further improve the efficacy of prompt learning in diverse and complex video scenes, we introduce a spatial-temporal decoupled deformable attention mechanism to establish an effective correlation between intra- and inter-frame features. Extensive experiments conducted on the DAVIS2017-unsupervised and YoutubeVIS19&21 and OIVS datasets demonstrate the superior performance of the proposed approach without mask supervision when compared to existing mask-supervised methods, as well as its capacity to generalize to weakly-annotated video datasets.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Weakly-Supervised Video Object Grounding via Learning Uni-Modal Associations
    Wang, Wei
    Gao, Junyu
    Xu, Changsheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 6329 - 6340
  • [32] Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions
    Li, Shuang
    Du, Yilun
    Torralba, Antonio
    Sivic, Josef
    Russell, Bryan
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1825 - 1835
  • [33] OWS-Seg: Online Weakly Supervised Video Instance Segmentation via Contrastive Learning
    Ning, Yuanxiang
    Li, Fei
    Dong, Mengping
    Li, Zhenbo
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VII, 2023, 14260 : 476 - 488
  • [34] FM-ABS: Promptable Foundation Model Drives Active Barely Supervised Learning for 3D Medical Image Segmentation
    Xu, Zhe
    Chen, Cheng
    Lu, Donghuan
    Sun, Jinghan
    Wei, Dong
    Zheng, Yefeng
    Li, Quanzheng
    Tong, Raymond Kai-yu
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT VIII, 2024, 15008 : 294 - 304
  • [35] The Devil is in the Points: Weakly Semi-Supervised Instance Segmentation via Point-Guided Mask Representation
    Kim, Beomyoung
    Jeong, Joonhyun
    Han, Dongyoon
    Hwang, Sung Ju
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11360 - 11370
  • [36] Weakly Supervised Volumetric Segmentation via Self-taught Shape Denoising Model
    He, Qian
    Li, Shuailin
    He, Xuming
    MEDICAL IMAGING WITH DEEP LEARNING, VOL 143, 2021, 143 : 268 - 285
  • [37] Semi-Supervised Video Object Segmentation via Learning Object-Aware Global-Local Correspondence
    Fan, Jiaqing
    Liu, Bo
    Zhang, Kaihua
    Liu, Qingshan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (12) : 8153 - 8164
  • [38] Uncertainty Estimation via Response Scaling for Pseudo-Mask Noise Mitigation in Weakly-Supervised Semantic Segmentation
    Li, Yi
    Duan, Yiqun
    Kuang, Zhanghui
    Chen, Yimin
    Zhang, Wayne
    Li, Xiaomeng
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1447 - 1455
  • [39] Learning Saliency-Free Model with Generic Features for Weakly-Supervised Semantic Segmentation
    Luo, Wenfeng
    Yang, Meng
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11717 - 11724
  • [40] Semi-supervised Video Object Segmentation Via an Edge Attention Gated Graph Convolutional Network
    Zhang, Yuqing
    Zhang, Yong
    Wang, Shaofan
    Liang, Yun
    Yin, Baocai
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (01)