Two-shot Video Object Segmentation

被引:12
|
作者
Yan, Kun [1 ]
Li, Xiao [2 ]
Wei, Fangyun [2 ]
Wang, Jinglu [2 ]
Zhang, Chenbin [1 ]
Wang, Ping [1 ]
Lu, Yan [2 ]
机构
[1] Peking Univ, Beijing, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
关键词
D O I
10.1109/CVPR52729.2023.00224
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Previous works on video object segmentation (VOS) are trained on densely annotated videos. Nevertheless, acquiring annotations in pixel level is expensive and time-consuming. In this work, we demonstrate the feasibility of training a satisfactory VOS model on sparsely annotated videos-we merely require two labeled frames per training video while the performance is sustained. We term this novel training paradigm as two-shot video object segmentation, or two-shot VOS for short. The underlying idea is to generate pseudo labels for unlabeled frames during training and to optimize the model on the combination of labeled and pseudo-labeled data. Our approach is extremely simple and can be applied to a majority of existing frameworks. We first pre-train a VOS model on sparsely annotated videos in a semi-supervised manner, with the first frame always being a labeled one. Then, we adopt the pre-trained VOS model to generate pseudo labels for all unlabeled frames, which are subsequently stored in a pseudo-label bank. Finally, we retrain a VOS model on both labeled and pseudo-labeled data without any restrictions on the first frame. For the first time, we present a general way to train VOS models on two-shot VOS datasets. By using 7.3% and 2.9% labeled data of YouTube-VOS and DAVIS benchmarks, our approach achieves comparable results in contrast to the counterparts trained on fully labeled set. Code and models are available at https://github.com/ykpku/Two-shot-Video-Object-Segmentation.
引用
收藏
页码:2257 / 2267
页数:11
相关论文
共 50 条
  • [41] YouMVOS: An Actor-centric Multi-shot Video Object Segmentation Dataset
    Wei, Donglai
    Kharbanda, Siddhant
    Arora, Sarthak
    Roy, Roshan
    Jain, Nishant
    Palrecha, Akash
    Shah, Tanav
    Mathur, Shray
    Mathur, Ritik
    Kemkar, Abhijay
    Chakravarthy, Anirudh
    Lin, Zudi
    Jang, Won-Dong
    Tang, Yansong
    Bai, Song
    Tompkin, James
    Torr, Philip H. S.
    Pfister, Hanspeter
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 21012 - 21021
  • [42] Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks
    Wang, Wenguan
    Lu, Xiankai
    Shen, Jianbing
    Crandall, David
    Shao, Ling
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9235 - 9244
  • [43] Zero-Shot Video Object Segmentation With Co-Attention Siamese Networks
    Lu, Xiankai
    Wang, Wenguan
    Shen, Jianbing
    Crandall, David
    Luo, Jiebo
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (04) : 2228 - 2242
  • [44] Single Image Dehazing Via Region Adaptive Two-Shot Network
    Li, Hui
    Wu, Qingbo
    Ngan, King Ngi
    Li, Hongliang
    Meng, Fanman
    IEEE MULTIMEDIA, 2021, 28 (03) : 97 - 106
  • [45] Two-shot sparse depth estimation using adaptive structured light
    Li, Q.
    Biswas, M.
    Pickering, M. R.
    Frater, M. R.
    ELECTRONICS LETTERS, 2011, 47 (13) : 745 - U30
  • [46] Two-shot point-diffraction interferometer with an unknown phase shift
    Bai, Fuzhong
    Liu, Zhen
    Bao, Xiaoyan
    JOURNAL OF OPTICS, 2010, 12 (04)
  • [47] Single Shot Video Object Detector
    Deng, Jiajun
    Pan, Yingwei
    Yao, Ting
    Zhou, Wengang
    Li, Houqiang
    Mei, Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 846 - 858
  • [48] Single Shot Video Object Detector
    Zhou, Wengang (zhwg@ustc.edu.cn); Yao, Ting (tingyao.ustc@gmail.com), 1600, Institute of Electrical and Electronics Engineers Inc. (23):
  • [49] Integrated video shot segmentation algorithm
    Li, WK
    Lai, SH
    STORAGE AND RETRIEVAL FOR MEDIA DATABASES 2003, 2003, 5021 : 264 - 271
  • [50] Sport video shot segmentation and classification
    Dahyot, R
    Rea, N
    Kokaram, A
    VISUAL COMMUNICATIONS AND IMAGE PROCESSING 2003, PTS 1-3, 2003, 5150 : 404 - 413