Two-shot Video Object Segmentation

被引：12

作者：

Yan, Kun ^{[1
]}

Li, Xiao ^{[2
]}

Wei, Fangyun ^{[2
]}

Wang, Jinglu ^{[2
]}

Zhang, Chenbin ^{[1
]}

Wang, Ping ^{[1
]}

Lu, Yan ^{[2
]}

机构：

[1] Peking Univ, Beijing, Peoples R China

[2] Microsoft Res Asia, Beijing, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.00224

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Previous works on video object segmentation (VOS) are trained on densely annotated videos. Nevertheless, acquiring annotations in pixel level is expensive and time-consuming. In this work, we demonstrate the feasibility of training a satisfactory VOS model on sparsely annotated videos-we merely require two labeled frames per training video while the performance is sustained. We term this novel training paradigm as two-shot video object segmentation, or two-shot VOS for short. The underlying idea is to generate pseudo labels for unlabeled frames during training and to optimize the model on the combination of labeled and pseudo-labeled data. Our approach is extremely simple and can be applied to a majority of existing frameworks. We first pre-train a VOS model on sparsely annotated videos in a semi-supervised manner, with the first frame always being a labeled one. Then, we adopt the pre-trained VOS model to generate pseudo labels for all unlabeled frames, which are subsequently stored in a pseudo-label bank. Finally, we retrain a VOS model on both labeled and pseudo-labeled data without any restrictions on the first frame. For the first time, we present a general way to train VOS models on two-shot VOS datasets. By using 7.3% and 2.9% labeled data of YouTube-VOS and DAVIS benchmarks, our approach achieves comparable results in contrast to the counterparts trained on fully labeled set. Code and models are available at https://github.com/ykpku/Two-shot-Video-Object-Segmentation.

引用

页码：2257 / 2267

页数：11

共 50 条

[41] YouMVOS: An Actor-centric Multi-shot Video Object Segmentation Dataset
Wei, Donglai
Kharbanda, Siddhant
Arora, Sarthak
Roy, Roshan
Jain, Nishant
Palrecha, Akash
Shah, Tanav
Mathur, Shray
Mathur, Ritik
Kemkar, Abhijay
Chakravarthy, Anirudh
Lin, Zudi
Jang, Won-Dong
Tang, Yansong
Bai, Song
Tompkin, James
Torr, Philip H. S.
Pfister, Hanspeter
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 21012 - 21021
[42] Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks
Wang, Wenguan
Lu, Xiankai
Shen, Jianbing
Crandall, David
Shao, Ling
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9235 - 9244
[43] Zero-Shot Video Object Segmentation With Co-Attention Siamese Networks
Lu, Xiankai
Wang, Wenguan
Shen, Jianbing
Crandall, David
Luo, Jiebo
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (04) : 2228 - 2242
[44] Single Image Dehazing Via Region Adaptive Two-Shot Network
Li, Hui
Wu, Qingbo
Ngan, King Ngi
Li, Hongliang
Meng, Fanman
IEEE MULTIMEDIA, 2021, 28 (03) : 97 - 106
[45] Two-shot sparse depth estimation using adaptive structured light
Li, Q.
Biswas, M.
Pickering, M. R.
Frater, M. R.
ELECTRONICS LETTERS, 2011, 47 (13) : 745 - U30
[46] Two-shot point-diffraction interferometer with an unknown phase shift
Bai, Fuzhong
Liu, Zhen
Bao, Xiaoyan
JOURNAL OF OPTICS, 2010, 12 (04)
[47] Single Shot Video Object Detector
Deng, Jiajun
Pan, Yingwei
Yao, Ting
Zhou, Wengang
Li, Houqiang
Mei, Tao
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 846 - 858
[48] Single Shot Video Object Detector
Zhou, Wengang (zhwg@ustc.edu.cn); Yao, Ting (tingyao.ustc@gmail.com), 1600, Institute of Electrical and Electronics Engineers Inc. (23):
[49] Integrated video shot segmentation algorithm
Li, WK
Lai, SH
STORAGE AND RETRIEVAL FOR MEDIA DATABASES 2003, 2003, 5021 : 264 - 271
[50] Sport video shot segmentation and classification
Dahyot, R
Rea, N
Kokaram, A
VISUAL COMMUNICATIONS AND IMAGE PROCESSING 2003, PTS 1-3, 2003, 5150 : 404 - 413

← 1 2 3 4 5 →