InsPro: Propagating Instance Query and Proposal for Online Video Instance Segmentation

被引：0

作者：

He, Fei ^{[1
,2
]}

Zhang, Haoyang ^{[4
]}

Gao, Naiyu ^{[4
]}

Jia, Jian ^{[1
,2
]}

Shan, Yanhu ^{[4
]}

Zhao, Xin ^{[1
,2
]}

Huang, Kaiqi ^{[1
,2
,3
]}

机构：

[1] Chinese Acad Sci, Inst Automat, CRISE, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China

[3] CAS Ctr Excellence Brain Sci & Intelligence Techn, Shanghai, Peoples R China

[4] Horizon Robot, Beijing, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video instance segmentation (VIS) aims at segmenting and tracking objects in videos. Prior methods typically generate frame-level or clip-level object instances first and then associate them by either additional tracking heads or complex instance matching algorithms. This explicit instance association approach increases system complexity and fails to fully exploit temporal cues in videos. In this paper, we design a simple, fast and yet effective query-based framework for online VIS. Relying on an instance query and proposal propagation mechanism with several specially developed components, this framework can perform accurate instance association implicitly. Specifically, we generate frame-level object instances based on a set of instance query-proposal pairs propagated from previous frames. This instance query-proposal pair is learned to bind with one specific object across frames through conscientiously developed strategies. When using such a pair to predict an object instance on the current frame, not only the generated instance is automatically associated with its precursors on previous frames, but the model gets a good prior for predicting the same object. In this way, we naturally achieve implicit instance association in parallel with segmentation and elegantly take advantage of temporal clues in videos. To show the effectiveness of our method InsPro, we evaluate it on two popular VIS benchmarks, i.e., YouTube-VIS 2019 and YouTube-VIS 2021. Without bells-and-whistles, our InsPro with ResNet-50 backbone achieves 43.2 AP and 37.6 AP on these two benchmarks respectively, outperforming all other online VIS methods.

引用

页数：14

共 50 条

[21] Contour proposal networks for biomedical instance segmentation
Upschulte, Eric
Harmeling, Stefan
Amunts, Katrin
Dickscheid, Timo
[J]. MEDICAL IMAGE ANALYSIS, 2022, 77
[22] Learnable Query Initialization for Surgical Instrument Instance Segmentation
Dhanakshirur, Rohan Raju
Shastry, K. N. Ajay
Borgavi, Kaustubh
Suri, Ashish
Kalra, Prem Kumar
Arora, Chetan
[J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT IX, 2023, 14228 : 728 - 738
[23] Dual Embedding Learning for Video Instance Segmentation
Feng, Qianyu
Yang, Zongxin
Li, Peike
Wei, Yunchao
Yang, Yi
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 717 - 720
[24] Mask-Free Video Instance Segmentation
Ke, Lei
Danelljan, Martin
Ding, Henghui
Tai, Yu-Wing
Tang, Chi-Keung
Yu, Fisher
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22857 - 22866
[25] Video Instance Segmentation in an Open-World
Thawakar, Omkar
Narayan, Sanath
Cholakkal, Hisham
Anwer, Rao Muhammad
Khan, Salman
Laaksonen, Jorma
Shah, Mubarak
Khan, Fahad Shahbaz
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024,
[26] Learning Hierarchical Embeddings for Video Instance Segmentation
Qin, Zheyun
Lu, Xiankai
Nie, Xiushan
Zhen, Xiantong
Yin, Yilong
[J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1884 - 1892
[27] DVIS: Decoupled Video Instance Segmentation Framework
Zhang, Tao
Tian, Xingye
Wu, Yu
Ji, Shunping
Wang, Xuebo
Zhang, Yuan
Wan, Pengfei
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1282 - 1291
[28] SeqFormer: Sequential Transformer for Video Instance Segmentation
Wu, Junfeng
Jiang, Yi
Bai, Song
Zhang, Wenqing
Bai, Xiang
[J]. COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 553 - 569
[29] MaskRNN: Instance Level Video Object Segmentation
Hu, Yuan-Ting
Huang, Jia-Bin
Schwing, Alexander G.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[30] Semantic Instance Meets Salient Object: Study on Video Semantic Salient Instance Segmentation
Trung-Nghia Le
Sugimoto, Akihiro
[J]. 2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 1779 - 1788

← 1 2 3 4 5 →