A Generalized Framework for Video Instance Segmentation

被引:15
|
作者
Heo, Miran [1 ]
Hwang, Sukjun [1 ]
Hyun, Jeongseok [1 ]
Kim, Hanjung [1 ]
Oh, Seoung Wug [2 ]
Lee, Joon-Young [2 ]
Kim, Seon Joo [1 ]
机构
[1] Yonsei Univ, Seoul, South Korea
[2] Adobe Res, San Francisco, CA USA
关键词
D O I
10.1109/CVPR52729.2023.01405
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The handling of long videos with complex and occluded sequences has recently emerged as a new challenge in the video instance segmentation (VIS) community. However, existing methods have limitations in addressing this challenge. We argue that the biggest bottleneck in current approaches is the discrepancy between training and inference. To effectively bridge this gap, we propose a Generalized framework for VIS, namely GenVIS, that achieves state-of-the-art performance on challenging benchmarks without designing complicated architectures or requiring extra post-processing. The key contribution of GenVIS is the learning strategy, which includes a query-based training pipeline for sequential learning with a novel target label assignment. Additionally, we introduce a memory that effectively acquires information from previous states. Thanks to the new perspective, which focuses on building relationships between separate frames or clips, GenVIS can be flexibly executed in both online and semi-online manner. We evaluate our approach on popular VIS benchmarks, achieving state-of-the-art results on YouTube-VIS 2019/2021/2022 and Occluded VIS (OVIS). Notably, we greatly outperform the state-of-the-art on the long VIS benchmark (OVIS), improving 5.6 AP with ResNet-50 backbone. Code is available at https://github.com/miranheo/GenVIS.
引用
收藏
页码:14623 / 14632
页数:10
相关论文
共 50 条
  • [1] InstanceFormer: An Online Video Instance Segmentation Framework
    Koner, Rajat
    Hannan, Tanveer
    Shit, Suprosanna
    Sharifzadeh, Sahand
    Schubert, Matthias
    Seidl, Thomas
    Tresp, Volker
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 1188 - 1195
  • [2] InstanceFormer: An Online Video Instance Segmentation Framework
    Ludwig Maximilian University of Munich, Germany
    不详
    arXiv, 1600,
  • [3] DVIS: Decoupled Video Instance Segmentation Framework
    Zhang, Tao
    Tian, Xingye
    Wu, Yu
    Ji, Shunping
    Wang, Xuebo
    Zhang, Yuan
    Wan, Pengfei
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1282 - 1291
  • [4] Video Instance Segmentation
    Yang, Linjie
    Fan, Yuchen
    Xu, Ning
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5187 - 5196
  • [5] Contextual Guided Segmentation Framework for Semi-supervised Video Instance Segmentation
    Trung-Nghia Le
    Tam V. Nguyen
    Minh-Triet Tran
    Machine Vision and Applications, 2022, 33
  • [6] Contextual Guided Segmentation Framework for Semi-supervised Video Instance Segmentation
    Le, Trung-Nghia
    Nguyen, Tam, V
    Tran, Minh-Triet
    MACHINE VISION AND APPLICATIONS, 2022, 33 (02)
  • [7] Adapting Video Instance Segmentation for Instance Search
    Nguyen, An Thi
    20TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI 2023, 2023, : 256 - 260
  • [8] Video Instance Segmentation by Instance Flow Assembly
    Li, Xiang
    Wang, Jinglu
    Li, Xiao
    Lu, Yan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7469 - 7479
  • [9] MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training
    Huang, De-An
    Yu, Zhiding
    Anandkumar, Anima
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [10] Instance Sequence Queries for Video Instance Segmentation with Transformers
    Xu, Zhujun
    Vivet, Damien
    SENSORS, 2021, 21 (13)