Two-shot Video Object Segmentation

被引：12

作者：

Yan, Kun ^{[1
]}

Li, Xiao ^{[2
]}

Wei, Fangyun ^{[2
]}

Wang, Jinglu ^{[2
]}

Zhang, Chenbin ^{[1
]}

Wang, Ping ^{[1
]}

Lu, Yan ^{[2
]}

机构：

[1] Peking Univ, Beijing, Peoples R China

[2] Microsoft Res Asia, Beijing, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.00224

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Previous works on video object segmentation (VOS) are trained on densely annotated videos. Nevertheless, acquiring annotations in pixel level is expensive and time-consuming. In this work, we demonstrate the feasibility of training a satisfactory VOS model on sparsely annotated videos-we merely require two labeled frames per training video while the performance is sustained. We term this novel training paradigm as two-shot video object segmentation, or two-shot VOS for short. The underlying idea is to generate pseudo labels for unlabeled frames during training and to optimize the model on the combination of labeled and pseudo-labeled data. Our approach is extremely simple and can be applied to a majority of existing frameworks. We first pre-train a VOS model on sparsely annotated videos in a semi-supervised manner, with the first frame always being a labeled one. Then, we adopt the pre-trained VOS model to generate pseudo labels for all unlabeled frames, which are subsequently stored in a pseudo-label bank. Finally, we retrain a VOS model on both labeled and pseudo-labeled data without any restrictions on the first frame. For the first time, we present a general way to train VOS models on two-shot VOS datasets. By using 7.3% and 2.9% labeled data of YouTube-VOS and DAVIS benchmarks, our approach achieves comparable results in contrast to the counterparts trained on fully labeled set. Code and models are available at https://github.com/ykpku/Two-shot-Video-Object-Segmentation.

引用

页码：2257 / 2267

页数：11

共 50 条

[31] Two-shot Spatially-varying BRDF and Shape Estimation
Boss, Mark
Jampani, Varun
Kim, Kihwan
Lensch, Hendrik P. A.
Kautz, Jan
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 3981 - 3990
[32] Molder increases capabilities—but not risk—with flexible two-shot technology
Deligio, Tony
Plastics Technology, 2019, 65 (03)
[33] Breaking the "Object" in Video Object Segmentation
Tokmakov, Pavel
Li, Jie
Gaidon, Adrien
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22836 - 22845
[34] Two-shot molding helps make reshoring cost-effective
Naitove, Matt, 2016, Gardner Business Media Inc. (62)
[35] Fast target-aware learning for few-shot video object segmentation
Yadang CHEN
Chuanyan HAO
Zhi-Xin YANG
Enhua WU
Science China(Information Sciences), 2022, 65 (08) : 71 - 86
[36] Fast target-aware learning for few-shot video object segmentation
Chen, Yadang
Hao, Chuanyan
Yang, Zhi-Xin
Wu, Enhua
SCIENCE CHINA-INFORMATION SCIENCES, 2022, 65 (08)
[37] AGSS-VOS: Attention Guided Single-Shot Video Object Segmentation
Lin, Huaijia
Qi, Xiaojuan
Jia, Jiaya
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3948 - 3956
[38] Co-attention Propagation Network for Zero-Shot Video Object Segmentation
Pei, Gensheng
Yao, Yazhou
Shen, Fumin
Huang, Dan
Huang, Xingguo
Shen, Heng-Tao
arXiv, 2023,
[39] Adaptive Multi-Source Predictor for Zero-Shot Video Object Segmentation
Zhao, Xiaoqi
Chang, Shijie
Pang, Youwei
Yang, Jiaxing
Zhang, Lihe
Lu, Huchuan
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (08) : 3232 - 3250
[40] Fast target-aware learning for few-shot video object segmentation
Yadang Chen
Chuanyan Hao
Zhi-Xin Yang
Enhua Wu
Science China Information Sciences, 2022, 65

← 1 2 3 4 5 →