A Generalized Framework for Video Instance Segmentation

被引:15
|
作者
Heo, Miran [1 ]
Hwang, Sukjun [1 ]
Hyun, Jeongseok [1 ]
Kim, Hanjung [1 ]
Oh, Seoung Wug [2 ]
Lee, Joon-Young [2 ]
Kim, Seon Joo [1 ]
机构
[1] Yonsei Univ, Seoul, South Korea
[2] Adobe Res, San Francisco, CA USA
关键词
D O I
10.1109/CVPR52729.2023.01405
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The handling of long videos with complex and occluded sequences has recently emerged as a new challenge in the video instance segmentation (VIS) community. However, existing methods have limitations in addressing this challenge. We argue that the biggest bottleneck in current approaches is the discrepancy between training and inference. To effectively bridge this gap, we propose a Generalized framework for VIS, namely GenVIS, that achieves state-of-the-art performance on challenging benchmarks without designing complicated architectures or requiring extra post-processing. The key contribution of GenVIS is the learning strategy, which includes a query-based training pipeline for sequential learning with a novel target label assignment. Additionally, we introduce a memory that effectively acquires information from previous states. Thanks to the new perspective, which focuses on building relationships between separate frames or clips, GenVIS can be flexibly executed in both online and semi-online manner. We evaluate our approach on popular VIS benchmarks, achieving state-of-the-art results on YouTube-VIS 2019/2021/2022 and Occluded VIS (OVIS). Notably, we greatly outperform the state-of-the-art on the long VIS benchmark (OVIS), improving 5.6 AP with ResNet-50 backbone. Code is available at https://github.com/miranheo/GenVIS.
引用
收藏
页码:14623 / 14632
页数:10
相关论文
共 50 条
  • [41] Occluded Video Instance Segmentation with Set Prediction Approach
    Bae, Heechul
    Song, Soonyong
    Park, Junhee
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3843 - 3846
  • [42] Video Instance Segmentation with a Propose-Reduce Paradigm
    Lin, Huaijia
    Wu, Ruizheng
    Liu, Shu
    Lu, Jiangbo
    Jia, Jiaya
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1719 - 1728
  • [43] LIP: Learning Instance Propagation for Video Object Segmentation
    Lyu, Ye
    Vosselman, George
    Xia, Gui-Song
    Yang, Michael Ying
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 2739 - 2748
  • [44] Crossover Learning for Fast Online Video Instance Segmentation
    Yang, Shusheng
    Fang, Yuxin
    Wang, Xinggang
    Li, Yu
    Fang, Chen
    Shan, Ying
    Feng, Bin
    Liu, Wenyu
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 8023 - 8032
  • [45] CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation
    Fu, Yang
    Yang, Linjie
    Liu, Ding
    Huang, Thomas S.
    Shi, Humphrey
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1361 - 1369
  • [46] Towards Open-Vocabulary Video Instance Segmentation
    Wang, Haochen
    Yan, Cilin
    Wang, Shuai
    Jiang, Xiaolong
    Tang, Xu
    Hu, Yao
    Xie, Weidi
    Gavves, Efstratios
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 4034 - 4043
  • [47] Instance Motion Tendency Learning for Video Panoptic Segmentation
    Wang, Le
    Liu, Hongzhen
    Zhou, Sanping
    Tang, Wei
    Hua, Gang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 764 - 778
  • [48] End-to-End Video Instance Segmentation with Transformers
    Wang, Yuqing
    Xu, Zhaoliang
    Wang, Xinlong
    Shen, Chunhua
    Cheng, Baoshan
    Shen, Hao
    Xia, Huaxia
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 8737 - 8746
  • [49] Semantic Instance Meets Salient Object: Study on Video Semantic Salient Instance Segmentation
    Trung-Nghia Le
    Sugimoto, Akihiro
    2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 1779 - 1788
  • [50] RT-VIS: Real-Time Video Instance Segmentation with Light-Weight Decoupled Framework
    Cao, Tianze
    Zhao, Sanyuan
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT X, 2025, 15040 : 485 - 499