InsPro: Propagating Instance Query and Proposal for Online Video Instance Segmentation

被引:0
|
作者
He, Fei [1 ,2 ]
Zhang, Haoyang [4 ]
Gao, Naiyu [4 ]
Jia, Jian [1 ,2 ]
Shan, Yanhu [4 ]
Zhao, Xin [1 ,2 ]
Huang, Kaiqi [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Automat, CRISE, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[3] CAS Ctr Excellence Brain Sci & Intelligence Techn, Shanghai, Peoples R China
[4] Horizon Robot, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video instance segmentation (VIS) aims at segmenting and tracking objects in videos. Prior methods typically generate frame-level or clip-level object instances first and then associate them by either additional tracking heads or complex instance matching algorithms. This explicit instance association approach increases system complexity and fails to fully exploit temporal cues in videos. In this paper, we design a simple, fast and yet effective query-based framework for online VIS. Relying on an instance query and proposal propagation mechanism with several specially developed components, this framework can perform accurate instance association implicitly. Specifically, we generate frame-level object instances based on a set of instance query-proposal pairs propagated from previous frames. This instance query-proposal pair is learned to bind with one specific object across frames through conscientiously developed strategies. When using such a pair to predict an object instance on the current frame, not only the generated instance is automatically associated with its precursors on previous frames, but the model gets a good prior for predicting the same object. In this way, we naturally achieve implicit instance association in parallel with segmentation and elegantly take advantage of temporal clues in videos. To show the effectiveness of our method InsPro, we evaluate it on two popular VIS benchmarks, i.e., YouTube-VIS 2019 and YouTube-VIS 2021. Without bells-and-whistles, our InsPro with ResNet-50 backbone achieves 43.2 AP and 37.6 AP on these two benchmarks respectively, outperforming all other online VIS methods.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Efficient Video Instance Segmentation via Tracklet Query and Proposal
    Wu, Jialian
    Yarram, Sudhir
    Liang, Hui
    Lan, Tian
    Yuan, Junsong
    Eledath, Jayan
    Medioni, Gerard
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 949 - 958
  • [2] Instance as Identity: A Generic Online Paradigm for Video Instance Segmentation
    Zhu, Feng
    Yang, Zongxin
    Yu, Xin
    Yang, Yi
    Wei, Yunchao
    [J]. COMPUTER VISION, ECCV 2022, PT XXIX, 2022, 13689 : 524 - 540
  • [3] In Defense of Online Models for Video Instance Segmentation
    Wu, Junfeng
    Liu, Qihao
    Jiang, Yi
    Bai, Song
    Yuille, Alan
    Bai, Xiang
    [J]. COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 588 - 605
  • [4] InstanceFormer: An Online Video Instance Segmentation Framework
    Koner, Rajat
    Hannan, Tanveer
    Shit, Suprosanna
    Sharifzadeh, Sahand
    Schubert, Matthias
    Seidl, Thomas
    Tresp, Volker
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 1188 - 1195
  • [5] InstanceFormer: An Online Video Instance Segmentation Framework
    Ludwig Maximilian University of Munich, Germany
    不详
    [J]. arXiv, 1600,
  • [6] Hybrid Instance-Aware Temporal Fusion for Online Video Instance Segmentation
    Li, Xiang
    Wang, Jinglu
    Li, Xiao
    Lu, Yan
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1429 - 1437
  • [7] Adapting Video Instance Segmentation for Instance Search
    Nguyen, An Thi
    [J]. 20TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI 2023, 2023, : 256 - 260
  • [8] Video Instance Segmentation by Instance Flow Assembly
    Li, Xiang
    Wang, Jinglu
    Li, Xiao
    Lu, Yan
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7469 - 7479
  • [9] Video Instance Segmentation
    Yang, Linjie
    Fan, Yuchen
    Xu, Ning
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5187 - 5196
  • [10] TCOVIS: Temporally Consistent Online Video Instance Segmentation
    Li, Junlong
    Yu, Bingyao
    Rao, Yongming
    Zhou, Jie
    Lu, Jiwen
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1097 - 1107