A Generalized Framework for Video Instance Segmentation

被引:15
|
作者
Heo, Miran [1 ]
Hwang, Sukjun [1 ]
Hyun, Jeongseok [1 ]
Kim, Hanjung [1 ]
Oh, Seoung Wug [2 ]
Lee, Joon-Young [2 ]
Kim, Seon Joo [1 ]
机构
[1] Yonsei Univ, Seoul, South Korea
[2] Adobe Res, San Francisco, CA USA
关键词
D O I
10.1109/CVPR52729.2023.01405
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The handling of long videos with complex and occluded sequences has recently emerged as a new challenge in the video instance segmentation (VIS) community. However, existing methods have limitations in addressing this challenge. We argue that the biggest bottleneck in current approaches is the discrepancy between training and inference. To effectively bridge this gap, we propose a Generalized framework for VIS, namely GenVIS, that achieves state-of-the-art performance on challenging benchmarks without designing complicated architectures or requiring extra post-processing. The key contribution of GenVIS is the learning strategy, which includes a query-based training pipeline for sequential learning with a novel target label assignment. Additionally, we introduce a memory that effectively acquires information from previous states. Thanks to the new perspective, which focuses on building relationships between separate frames or clips, GenVIS can be flexibly executed in both online and semi-online manner. We evaluate our approach on popular VIS benchmarks, achieving state-of-the-art results on YouTube-VIS 2019/2021/2022 and Occluded VIS (OVIS). Notably, we greatly outperform the state-of-the-art on the long VIS benchmark (OVIS), improving 5.6 AP with ResNet-50 backbone. Code is available at https://github.com/miranheo/GenVIS.
引用
收藏
页码:14623 / 14632
页数:10
相关论文
共 50 条
  • [21] Video Instance Segmentation in an Open-World
    Thawakar, Omkar
    Narayan, Sanath
    Cholakkal, Hisham
    Anwer, Rao Muhammad
    Khan, Salman
    Laaksonen, Jorma
    Shah, Mubarak
    Khan, Fahad Shahbaz
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (01) : 398 - 409
  • [22] InsPro: Propagating Instance Query and Proposal for Online Video Instance Segmentation
    He, Fei
    Zhang, Haoyang
    Gao, Naiyu
    Jia, Jian
    Shan, Yanhu
    Zhao, Xin
    Huang, Kaiqi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [23] MaskRNN: Instance Level Video Object Segmentation
    Hu, Yuan-Ting
    Huang, Jia-Bin
    Schwing, Alexander G.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [24] SeqFormer: Sequential Transformer for Video Instance Segmentation
    Wu, Junfeng
    Jiang, Yi
    Bai, Song
    Zhang, Wenqing
    Bai, Xiang
    COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 553 - 569
  • [25] Video-SwinUNet: Spatio-temporal Deep Learning Framework for VFSS Instance Segmentation
    Zeng, Chengxi
    Yang, Xinyu
    Smithard, David
    Mirmehdi, Majid
    Gambaruto, Alberto M.
    Burghardt, Tilo
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2470 - 2474
  • [26] A Framework for Contextual Recommendations Using Instance Segmentation
    Tsiktsiris, Dimitris
    Dimitriou, Nikolaos
    Kolias, Zisis
    Skourti, Stavri
    Girssas, Paul
    Lalas, Antonios
    Votis, Konstantinos
    Tzovaras, Dimitrios
    ARTIFICIAL INTELLIGENCE IN HCI, AI-HCI 2023, PT II, 2023, 14051 : 395 - 408
  • [27] Hybrid Instance-Aware Temporal Fusion for Online Video Instance Segmentation
    Li, Xiang
    Wang, Jinglu
    Li, Xiao
    Lu, Yan
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1429 - 1437
  • [28] Video Mask Transfiner for High-Quality Video Instance Segmentation
    Ke, Lei
    Ding, Henghui
    Danelljan, Martin
    Tai, Yu-Wing
    Tang, Chi-Keung
    Yu, Fisher
    COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 731 - 747
  • [29] Instance Embedding Transfer to Unsupervised Video Object Segmentation
    Li, Siyang
    Seybold, Bryan
    Vorobyov, Alexey
    Fathi, Alireza
    Huang, Qin
    Kuo, C. -C. Jay
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6526 - 6535
  • [30] TCOVIS: Temporally Consistent Online Video Instance Segmentation
    Li, Junlong
    Yu, Bingyao
    Rao, Yongming
    Zhou, Jie
    Lu, Jiwen
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1097 - 1107