A Generalized Framework for Video Instance Segmentation

被引：15

作者：

Heo, Miran ^{[1
]}

Hwang, Sukjun ^{[1
]}

Hyun, Jeongseok ^{[1
]}

Kim, Hanjung ^{[1
]}

Oh, Seoung Wug ^{[2
]}

Lee, Joon-Young ^{[2
]}

Kim, Seon Joo ^{[1
]}

机构：

[1] Yonsei Univ, Seoul, South Korea

[2] Adobe Res, San Francisco, CA USA

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.01405

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The handling of long videos with complex and occluded sequences has recently emerged as a new challenge in the video instance segmentation (VIS) community. However, existing methods have limitations in addressing this challenge. We argue that the biggest bottleneck in current approaches is the discrepancy between training and inference. To effectively bridge this gap, we propose a Generalized framework for VIS, namely GenVIS, that achieves state-of-the-art performance on challenging benchmarks without designing complicated architectures or requiring extra post-processing. The key contribution of GenVIS is the learning strategy, which includes a query-based training pipeline for sequential learning with a novel target label assignment. Additionally, we introduce a memory that effectively acquires information from previous states. Thanks to the new perspective, which focuses on building relationships between separate frames or clips, GenVIS can be flexibly executed in both online and semi-online manner. We evaluate our approach on popular VIS benchmarks, achieving state-of-the-art results on YouTube-VIS 2019/2021/2022 and Occluded VIS (OVIS). Notably, we greatly outperform the state-of-the-art on the long VIS benchmark (OVIS), improving 5.6 AP with ResNet-50 backbone. Code is available at https://github.com/miranheo/GenVIS.

引用

页码：14623 / 14632

页数：10

共 50 条

[1] InstanceFormer: An Online Video Instance Segmentation Framework
Koner, Rajat
Hannan, Tanveer
Shit, Suprosanna
Sharifzadeh, Sahand
Schubert, Matthias
Seidl, Thomas
Tresp, Volker
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 1188 - 1195
[2] InstanceFormer: An Online Video Instance Segmentation Framework
Ludwig Maximilian University of Munich, Germany
不详
arXiv, 1600,
[3] DVIS: Decoupled Video Instance Segmentation Framework
Zhang, Tao
Tian, Xingye
Wu, Yu
Ji, Shunping
Wang, Xuebo
Zhang, Yuan
Wan, Pengfei
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1282 - 1291
[4] Video Instance Segmentation
Yang, Linjie
Fan, Yuchen
Xu, Ning
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5187 - 5196
[5] Contextual Guided Segmentation Framework for Semi-supervised Video Instance Segmentation
Trung-Nghia Le
Tam V. Nguyen
Minh-Triet Tran
Machine Vision and Applications, 2022, 33
[6] Contextual Guided Segmentation Framework for Semi-supervised Video Instance Segmentation
Le, Trung-Nghia
Nguyen, Tam, V
Tran, Minh-Triet
MACHINE VISION AND APPLICATIONS, 2022, 33 (02)
[7] Adapting Video Instance Segmentation for Instance Search
Nguyen, An Thi
20TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI 2023, 2023, : 256 - 260
[8] Video Instance Segmentation by Instance Flow Assembly
Li, Xiang
Wang, Jinglu
Li, Xiao
Lu, Yan
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7469 - 7479
[9] MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training
Huang, De-An
Yu, Zhiding
Anandkumar, Anima
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[10] Instance Sequence Queries for Video Instance Segmentation with Transformers
Xu, Zhujun
Vivet, Damien
SENSORS, 2021, 21 (13)

← 1 2 3 4 5 →