InsPro: Propagating Instance Query and Proposal for Online Video Instance Segmentation

被引:0
|
作者
He, Fei [1 ,2 ]
Zhang, Haoyang [4 ]
Gao, Naiyu [4 ]
Jia, Jian [1 ,2 ]
Shan, Yanhu [4 ]
Zhao, Xin [1 ,2 ]
Huang, Kaiqi [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Automat, CRISE, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[3] CAS Ctr Excellence Brain Sci & Intelligence Techn, Shanghai, Peoples R China
[4] Horizon Robot, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video instance segmentation (VIS) aims at segmenting and tracking objects in videos. Prior methods typically generate frame-level or clip-level object instances first and then associate them by either additional tracking heads or complex instance matching algorithms. This explicit instance association approach increases system complexity and fails to fully exploit temporal cues in videos. In this paper, we design a simple, fast and yet effective query-based framework for online VIS. Relying on an instance query and proposal propagation mechanism with several specially developed components, this framework can perform accurate instance association implicitly. Specifically, we generate frame-level object instances based on a set of instance query-proposal pairs propagated from previous frames. This instance query-proposal pair is learned to bind with one specific object across frames through conscientiously developed strategies. When using such a pair to predict an object instance on the current frame, not only the generated instance is automatically associated with its precursors on previous frames, but the model gets a good prior for predicting the same object. In this way, we naturally achieve implicit instance association in parallel with segmentation and elegantly take advantage of temporal clues in videos. To show the effectiveness of our method InsPro, we evaluate it on two popular VIS benchmarks, i.e., YouTube-VIS 2019 and YouTube-VIS 2021. Without bells-and-whistles, our InsPro with ResNet-50 backbone achieves 43.2 AP and 37.6 AP on these two benchmarks respectively, outperforming all other online VIS methods.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Contour proposal networks for biomedical instance segmentation
    Upschulte, Eric
    Harmeling, Stefan
    Amunts, Katrin
    Dickscheid, Timo
    [J]. MEDICAL IMAGE ANALYSIS, 2022, 77
  • [22] Learnable Query Initialization for Surgical Instrument Instance Segmentation
    Dhanakshirur, Rohan Raju
    Shastry, K. N. Ajay
    Borgavi, Kaustubh
    Suri, Ashish
    Kalra, Prem Kumar
    Arora, Chetan
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT IX, 2023, 14228 : 728 - 738
  • [23] Dual Embedding Learning for Video Instance Segmentation
    Feng, Qianyu
    Yang, Zongxin
    Li, Peike
    Wei, Yunchao
    Yang, Yi
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 717 - 720
  • [24] Mask-Free Video Instance Segmentation
    Ke, Lei
    Danelljan, Martin
    Ding, Henghui
    Tai, Yu-Wing
    Tang, Chi-Keung
    Yu, Fisher
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22857 - 22866
  • [25] Video Instance Segmentation in an Open-World
    Thawakar, Omkar
    Narayan, Sanath
    Cholakkal, Hisham
    Anwer, Rao Muhammad
    Khan, Salman
    Laaksonen, Jorma
    Shah, Mubarak
    Khan, Fahad Shahbaz
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024,
  • [26] Learning Hierarchical Embeddings for Video Instance Segmentation
    Qin, Zheyun
    Lu, Xiankai
    Nie, Xiushan
    Zhen, Xiantong
    Yin, Yilong
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1884 - 1892
  • [27] DVIS: Decoupled Video Instance Segmentation Framework
    Zhang, Tao
    Tian, Xingye
    Wu, Yu
    Ji, Shunping
    Wang, Xuebo
    Zhang, Yuan
    Wan, Pengfei
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1282 - 1291
  • [28] SeqFormer: Sequential Transformer for Video Instance Segmentation
    Wu, Junfeng
    Jiang, Yi
    Bai, Song
    Zhang, Wenqing
    Bai, Xiang
    [J]. COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 553 - 569
  • [29] MaskRNN: Instance Level Video Object Segmentation
    Hu, Yuan-Ting
    Huang, Jia-Bin
    Schwing, Alexander G.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [30] Semantic Instance Meets Salient Object: Study on Video Semantic Salient Instance Segmentation
    Trung-Nghia Le
    Sugimoto, Akihiro
    [J]. 2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 1779 - 1788