A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

被引:2
|
作者
Zhu, Chaoyang [1 ]
Chen, Long [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Kowloon, Hong Kong, Peoples R China
关键词
Open-vocabulary; zero-shot learning; object detection; image segmentation; future directions; OBJECT; LANGUAGE;
D O I
10.1109/TPAMI.2024.3413013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As the most fundamental scene understanding tasks, object detection and segmentation have made tremendous progress in deep learning era. Due to the expensive manual labeling cost, the annotated categories in existing datasets are often small-scale and pre-defined, i.e., state-of-the-art fully-supervised detectors and segmentors fail to generalize beyond the closed vocabulary. To resolve this limitation, in the last few years, the community has witnessed an increasing attention toward Open-Vocabulary Detection (OVD) and Segmentation (OVS). By "open-vocabulary", we mean that the models can classify objects beyond pre-defined categories. In this survey, we provide a comprehensive review on recent developments of OVD and OVS. A taxonomy is first developed to organize different tasks and methodologies. We find that the permission and usage of weak supervision signals can well discriminate different methodologies, including: visual-semantic space mapping, novel visual feature synthesis, region-aware training, pseudo-labeling, knowledge distillation, and transfer learning. The proposed taxonomy is universal across different tasks, covering object detection, semantic/instance/panoptic segmentation, 3D and video understanding. The main design principles, key challenges, development routes, methodology strengths, and weaknesses are thoroughly analyzed.
引用
收藏
页码:8954 / 8975
页数:22
相关论文
共 50 条
  • [21] Global Knowledge Calibration for Fast Open-Vocabulary Segmentation
    Han, Kunyang
    Liu, Yong
    Liew, Jun Hao
    Ding, Henghui
    Liu, Jiajun
    Wang, Yitong
    Tang, Yansong
    Yang, Yujiu
    Feng, Jiashi
    Zhao, Yao
    Wei, Yunchao
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 797 - 807
  • [22] Open-Vocabulary Part-Level Detection and Segmentation for Human-Robot Interaction
    Yang, Shan
    Liu, Xiongding
    Wei, Wu
    APPLIED SCIENCES-BASEL, 2024, 14 (14):
  • [23] Open-Vocabulary Object Detection Using Captions
    Zareian, Alireza
    Dela Rosa, Kevin
    Hu, Derek Hao
    Chang, Shih-Fu
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 14388 - 14397
  • [24] Weakly Supervised Open-Vocabulary Object Detection
    Lin, Jianghang
    Shen, Yunhang
    Wang, Bingquan
    Lin, Shaohui
    Li, Ke
    Cao, Liujuan
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3404 - 3412
  • [25] In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
    Kang, Dahyun
    Cho, Minsu
    COMPUTER VISION - ECCV 2024, PT XLI, 2025, 15099 : 143 - 164
  • [26] Open-Vocabulary Instance Segmentation-Boundary IS-Goal
    Tang, Quan
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT IV, 2025, 15034 : 420 - 435
  • [27] USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation
    Wang, Xiaoqi
    He, Wenbin
    Xuan, Xiwei
    Sebastian, Clint
    Ono, Jorge Piazentin
    Li, Xin
    Behpour, Sima
    Thang Doan
    Gou, Liang
    Shen, Han-Wei
    Ren, Liu
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 4187 - 4196
  • [28] Weakly Supervised 3D Open-vocabulary Segmentation
    Liu, Kunhao
    Zhan, Fangneng
    Zhang, Jiahui
    Xu, Muyu
    Yu, Yingchen
    El Saddik, Abdulmotaleb
    Theobalt, Christian
    Xing, Eric
    Lu, Shijian
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [29] ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation
    Lan, Mengcheng
    Chen, Chaofeng
    Ke, Yiping
    Wang, Xinjiang
    Feng, Litong
    Zhang, Wayne
    COMPUTER VISION - ECCV 2024, PT XXXVII, 2025, 15095 : 70 - 88
  • [30] Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation
    Zhang, Fei
    Zhou, Tianfei
    Li, Boyang
    He, Hao
    Ma, Chaofan
    Zhang, Tianjiao
    Yao, Jiangchao
    Zhang, Ya
    Wang, Yanfeng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,