A Generalized Framework for Video Instance Segmentation

被引:15
|
作者
Heo, Miran [1 ]
Hwang, Sukjun [1 ]
Hyun, Jeongseok [1 ]
Kim, Hanjung [1 ]
Oh, Seoung Wug [2 ]
Lee, Joon-Young [2 ]
Kim, Seon Joo [1 ]
机构
[1] Yonsei Univ, Seoul, South Korea
[2] Adobe Res, San Francisco, CA USA
关键词
D O I
10.1109/CVPR52729.2023.01405
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The handling of long videos with complex and occluded sequences has recently emerged as a new challenge in the video instance segmentation (VIS) community. However, existing methods have limitations in addressing this challenge. We argue that the biggest bottleneck in current approaches is the discrepancy between training and inference. To effectively bridge this gap, we propose a Generalized framework for VIS, namely GenVIS, that achieves state-of-the-art performance on challenging benchmarks without designing complicated architectures or requiring extra post-processing. The key contribution of GenVIS is the learning strategy, which includes a query-based training pipeline for sequential learning with a novel target label assignment. Additionally, we introduce a memory that effectively acquires information from previous states. Thanks to the new perspective, which focuses on building relationships between separate frames or clips, GenVIS can be flexibly executed in both online and semi-online manner. We evaluate our approach on popular VIS benchmarks, achieving state-of-the-art results on YouTube-VIS 2019/2021/2022 and Occluded VIS (OVIS). Notably, we greatly outperform the state-of-the-art on the long VIS benchmark (OVIS), improving 5.6 AP with ResNet-50 backbone. Code is available at https://github.com/miranheo/GenVIS.
引用
收藏
页码:14623 / 14632
页数:10
相关论文
共 50 条
  • [31] VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation
    Wang, Xudong
    Misra, Ishan
    Zeng, Ziyun
    Girdhar, Rohit
    Darrell, Trevor
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 22755 - 22764
  • [32] TIVE: A toolbox for identifying video instance segmentation errors
    Jia, Wenhe
    Yang, Lu
    Jia, Zilong
    Zhao, Wenyi
    Zhou, Yilin
    Song, Qing
    NEUROCOMPUTING, 2023, 545
  • [33] Video Instance Segmentation Using Graph Matching Transformer
    Qin, Zheyun
    Lu, Xiankai
    Nie, Xiushan
    Yin, Yilong
    2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 995 - 1004
  • [34] Recurrent Graph Neural Networks for Video Instance Segmentation
    Brissman, Emil
    Johnander, Joakim
    Danelljan, Martin
    Felsberg, Michael
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (02) : 471 - 495
  • [35] Video Instance Segmentation Tracking with a Modified VAE Architecture
    Lin, Chung-Ching
    Hung, Ying
    Feris, Rogerio
    He, Linglin
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 13144 - 13154
  • [36] Temporal Feature Augmented Network for Video Instance Segmentation
    Dong, Minghui
    Wang, Jian
    Huang, Yuanyuan
    Yu, Dongdong
    Su, Kai
    Zhou, Kaihui
    Shao, Jie
    Wen, Shiping
    Wang, Changhu
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 721 - 724
  • [37] Temporally Efficient Vision Transformer for Video Instance Segmentation
    Yang, Shusheng
    Wang, Xinggang
    Li, Yu
    Fang, Yuxin
    Fang, Jiemin
    Liu, Wenyu
    Zhao, Xun
    Shan, Ying
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2875 - 2885
  • [38] Recurrent Graph Neural Networks for Video Instance Segmentation
    Emil Brissman
    Joakim Johnander
    Martin Danelljan
    Michael Felsberg
    International Journal of Computer Vision, 2023, 131 : 471 - 495
  • [39] CTVIS: Consistent Training for Online Video Instance Segmentation
    Ying, Kaining
    Zhong, Qing
    Mao, Weian
    Wang, Zhenhua
    Chen, Hao
    Wu, Lin Yuanbo
    Liu, Yifan
    Fan, Chengxiang
    Zhuge, Yunzhi
    Shen, Chunhua
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 899 - 908
  • [40] In-Depth Collaboratively Supervised Video Instance Segmentation
    Deng, Yunnan
    Zhang, Yinhui
    He, Zifen
    ELECTRONICS, 2025, 14 (02):