VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation

被引：1

作者：

Wang, Xudong ^{[1
]}

Misra, Ishan

Zeng, Ziyun

Girdhar, Rohit

Darrell, Trevor

机构：

[1] Univ Calif Berkeley, Berkeley, CA 94720 USA

来源：

2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2024年

关键词：

D O I：

10.1109/CVPR52733.2024.02147

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing approaches to unsupervised video instance segmentation typically rely on motion estimates and experience difficulties tracking small or divergent motions. We present VideoCutLER, a simple method for unsupervised multi-instance video segmentation without using motion-based learning signals like optical flow or training on natural videos. Our key insight is that using high-quality pseudo masks and a simple video synthesis method for model training is surprisingly sufficient to enable the resulting video model to effectively segment and track multiple instances across video frames. We show the first competitive unsupervised learning results on the challenging YouTubeVIS-2019 benchmark, achieving 50.7% AP(50)(video), surpassing the previous state-of-the-art by a large margin. VideoCutLER can also serve as a strong pretrained model for supervised video instance segmentation tasks, exceeding DINO by 15.9% on YouTubeVIS-2019 in terms of AP(video).

引用

页码：22755 / 22764

页数：10

共 50 条

[21] Mask-Free Video Instance Segmentation
Ke, Lei
Danelljan, Martin
Ding, Henghui
Tai, Yu-Wing
Tang, Chi-Keung
Yu, Fisher
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22857 - 22866
[22] Learning Hierarchical Embeddings for Video Instance Segmentation
Qin, Zheyun
Lu, Xiankai
Nie, Xiushan
Zhen, Xiantong
Yin, Yilong
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1884 - 1892
[23] Video Instance Segmentation in an Open-World
Thawakar, Omkar
Narayan, Sanath
Cholakkal, Hisham
Anwer, Rao Muhammad
Khan, Salman
Laaksonen, Jorma
Shah, Mubarak
Khan, Fahad Shahbaz
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (01) : 398 - 409
[24] InstanceFormer: An Online Video Instance Segmentation Framework
Ludwig Maximilian University of Munich, Germany
不详
arXiv, 1600,
[25] InsPro: Propagating Instance Query and Proposal for Online Video Instance Segmentation
He, Fei
Zhang, Haoyang
Gao, Naiyu
Jia, Jian
Shan, Yanhu
Zhao, Xin
Huang, Kaiqi
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[26] Dense Unsupervised Learning for Video Segmentation
Araslanov, Nikita
Schaub-Meyer, Simone
Roth, Stefan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[27] Unsupervised video segmentation and object tracking
Sista, S
Kashyap, RL
COMPUTERS IN INDUSTRY, 2000, 42 (2-3) : 127 - 146
[28] MaskRNN: Instance Level Video Object Segmentation
Hu, Yuan-Ting
Huang, Jia-Bin
Schwing, Alexander G.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[29] SeqFormer: Sequential Transformer for Video Instance Segmentation
Wu, Junfeng
Jiang, Yi
Bai, Song
Zhang, Wenqing
Bai, Xiang
COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 553 - 569
[30] Role of prefiltering in unsupervised video segmentation
Karaca, HM
Anarim, E
Morgül, A
2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1999 - 2002

← 1 2 3 4 5 →