Two-shot Video Object Segmentation

被引:12
|
作者
Yan, Kun [1 ]
Li, Xiao [2 ]
Wei, Fangyun [2 ]
Wang, Jinglu [2 ]
Zhang, Chenbin [1 ]
Wang, Ping [1 ]
Lu, Yan [2 ]
机构
[1] Peking Univ, Beijing, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
关键词
D O I
10.1109/CVPR52729.2023.00224
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Previous works on video object segmentation (VOS) are trained on densely annotated videos. Nevertheless, acquiring annotations in pixel level is expensive and time-consuming. In this work, we demonstrate the feasibility of training a satisfactory VOS model on sparsely annotated videos-we merely require two labeled frames per training video while the performance is sustained. We term this novel training paradigm as two-shot video object segmentation, or two-shot VOS for short. The underlying idea is to generate pseudo labels for unlabeled frames during training and to optimize the model on the combination of labeled and pseudo-labeled data. Our approach is extremely simple and can be applied to a majority of existing frameworks. We first pre-train a VOS model on sparsely annotated videos in a semi-supervised manner, with the first frame always being a labeled one. Then, we adopt the pre-trained VOS model to generate pseudo labels for all unlabeled frames, which are subsequently stored in a pseudo-label bank. Finally, we retrain a VOS model on both labeled and pseudo-labeled data without any restrictions on the first frame. For the first time, we present a general way to train VOS models on two-shot VOS datasets. By using 7.3% and 2.9% labeled data of YouTube-VOS and DAVIS benchmarks, our approach achieves comparable results in contrast to the counterparts trained on fully labeled set. Code and models are available at https://github.com/ykpku/Two-shot-Video-Object-Segmentation.
引用
收藏
页码:2257 / 2267
页数:11
相关论文
共 50 条
  • [31] Two-shot Spatially-varying BRDF and Shape Estimation
    Boss, Mark
    Jampani, Varun
    Kim, Kihwan
    Lensch, Hendrik P. A.
    Kautz, Jan
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 3981 - 3990
  • [33] Breaking the "Object" in Video Object Segmentation
    Tokmakov, Pavel
    Li, Jie
    Gaidon, Adrien
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22836 - 22845
  • [35] Fast target-aware learning for few-shot video object segmentation
    Yadang CHEN
    Chuanyan HAO
    Zhi-Xin YANG
    Enhua WU
    Science China(Information Sciences), 2022, 65 (08) : 71 - 86
  • [36] Fast target-aware learning for few-shot video object segmentation
    Chen, Yadang
    Hao, Chuanyan
    Yang, Zhi-Xin
    Wu, Enhua
    SCIENCE CHINA-INFORMATION SCIENCES, 2022, 65 (08)
  • [37] AGSS-VOS: Attention Guided Single-Shot Video Object Segmentation
    Lin, Huaijia
    Qi, Xiaojuan
    Jia, Jiaya
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3948 - 3956
  • [38] Co-attention Propagation Network for Zero-Shot Video Object Segmentation
    Pei, Gensheng
    Yao, Yazhou
    Shen, Fumin
    Huang, Dan
    Huang, Xingguo
    Shen, Heng-Tao
    arXiv, 2023,
  • [39] Adaptive Multi-Source Predictor for Zero-Shot Video Object Segmentation
    Zhao, Xiaoqi
    Chang, Shijie
    Pang, Youwei
    Yang, Jiaxing
    Zhang, Lihe
    Lu, Huchuan
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (08) : 3232 - 3250
  • [40] Fast target-aware learning for few-shot video object segmentation
    Yadang Chen
    Chuanyan Hao
    Zhi-Xin Yang
    Enhua Wu
    Science China Information Sciences, 2022, 65