Video Instance Segmentation in an Open-World

被引:0
|
作者
Thawakar, Omkar [1 ]
Narayan, Sanath [2 ]
Cholakkal, Hisham [1 ]
Anwer, Rao Muhammad [1 ,3 ]
Khan, Salman [1 ]
Laaksonen, Jorma [3 ]
Shah, Mubarak [4 ]
Khan, Fahad Shahbaz [1 ,5 ]
机构
[1] Mohamed bin Zayed Univ AI, Abu Dhabi, U Arab Emirates
[2] Technol Innovat Inst, Abu Dhabi, U Arab Emirates
[3] Aalto Univ, Espoo, Finland
[4] Univ Cent Florida, Orlando, FL USA
[5] Linkoping Univ, Linkoping, Sweden
基金
瑞典研究理事会;
关键词
Open-world segmentation; Video instance segmentation; Object-detection; Video object detection; Transformers;
D O I
10.1007/s11263-024-02195-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing video instance segmentation (VIS) approaches generally follow a closed-world assumption, where only seen category instances are identified and spatio-temporally segmented at inference. Open-world formulation relaxes the close-world static-learning assumption as follows: (a) first, it distinguishes a set of known categories as well as labels an unknown object as 'unknown' and then (b) it incrementally learns the class of an unknown as and when the corresponding semantic labels become available. We propose the first open-world VIS approach, named OW-VISFormer, that introduces a novel feature enrichment mechanism and a spatio-temporal objectness (STO) module. The feature enrichment mechanism based on a light-weight auxiliary network aims at accurate pixel-level (unknown) object delineation from the background as well as distinguishing category-specific known semantic classes. The STO module strives to generate instance-level pseudo-labels by enhancing the foreground activations through a contrastive loss. Moreover, we also introduce an extensive experimental protocol to measure the characteristics of OW-VIS. Our OW-VISFormer performs favorably against a solid baseline in OW-VIS setting. Further, we evaluate our contributions in the standard fully-supervised VIS setting by integrating them into the recent SeqFormer, achieving an absolute gain of 1.6% AP on Youtube-VIS 2019 val. set. Lastly, we show the generalizability of our contributions for the open-world detection (OWOD) setting, outperforming the best existing OWOD method in the literature. Code, models along with OW-VIS splits are available at https://github.com/OmkarThawakar/OWVISFormer.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Exploring Transformers for Open-world Instance Segmentation
    Wu, Jiannan
    Jiang, Yi
    Yan, Bin
    Lu, Huchuan
    Yuan, Zehuan
    Luo, Ping
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6588 - 6598
  • [2] 3D Indoor Instance Segmentation in an Open-World
    Boudjoghra, Mohamed El Amine
    Al Khatib, Salwa K.
    Lahoud, Jean
    Cholakkal, Hisham
    Anwer, Rao Muhammad
    Khan, Salman
    Khan, Fahad Shahbaz
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36, NEURIPS 2023, 2023,
  • [3] Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation
    Wang, Weiyao
    Feiszli, Matt
    Wang, Heng
    Tran, Du
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 10756 - 10765
  • [4] OpenInst: A simple query-based method for open-world instance segmentation
    Wang, Cheng
    Wang, Guoli
    Zhang, Qian
    Guo, Peng
    Liu, Wenyu
    Wang, Xinggang
    [J]. PATTERN RECOGNITION, 2024, 153
  • [5] ElC-OIS: Ellipsoidal Clustering for Open-World Instance Segmentation on LiDAR Data
    Deng, Wenbang
    Huang, Kaihong
    Yu, Qinghua
    Lu, Huimin
    Zheng, Zhiqiang
    Chen, Xieyuanli
    [J]. 2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 7606 - 7613
  • [6] Towards Open-World Segmentation of Parts
    Pan, Tai-Yu
    Liu, Qing
    Chao, Wei-Lun
    Price, Brian
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15392 - 15401
  • [7] Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity
    Wang, Weiyao
    Feiszli, Matt
    Wang, Heng
    Malik, Jitendra
    Tran, Du
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4412 - 4422
  • [8] Transformer-based Open-world Instance Segmentation with Cross-task Consistency Regularization
    Xue, Xizhe
    Yu, Dongdong
    Liu, Lingqiao
    Liu, Yu
    Tsutsui, Satoshi
    Li, Ying
    Yuan, Zehuan
    Song, Ping
    Shou, Mike Zheng
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2507 - 2515
  • [9] Geographical Aspects of Open-World Video Games
    Fraile-Jurado, Pablo
    [J]. GAMES AND CULTURE, 2024, 19 (07) : 872 - 896
  • [10] Open-world Semantic Segmentation for LIDAR Point Clouds
    Cen, Jun
    Yun, Peng
    Zhang, Shiwei
    Cai, Junhao
    Luan, Di
    Tang, Mingqian
    Liu, Ming
    Wang, Michael Yu
    [J]. COMPUTER VISION, ECCV 2022, PT XXXVIII, 2022, 13698 : 318 - 334