LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction

被引:0
|
作者
Du, Penghui [1 ,2 ,3 ]
Wang, Yu [2 ]
Sung, Yifan [2 ]
Wang, Luting [1 ]
Li, Yue [1 ]
Zhang, Gang [2 ]
Ding, Errui [2 ]
Wang, Yan [3 ]
Wang, Jingdong [2 ]
Liu, Si [1 ]
机构
[1] Beihang Univ, Beijing, Peoples R China
[2] Baidu, Beijing, Peoples R China
[3] Tsinghua Univ, AIR, Beijing, Peoples R China
来源
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
Inter-category Relationships; Language Model; DETR;
D O I
10.1007/978-3-031-73337-6_18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing methods enhance open-vocabulary object detection by leveraging the robust open-vocabulary recognition capabilities of Vision-Language Models (VLMs), such as CLIP. However, two main challenges emerge: (1) A deficiency in concept representation, where the category names in CLIP's text space lack textual and visual knowledge. (2) An overfitting tendency towards base categories, with the open vocabulary knowledge biased towards base categories during the transfer from VLMs to detectors. To address these challenges, we propose the Language Model Instruction (LaMI) strategy, which leverages the relationships between visual concepts and applies them within a simple yet effective DETR-like detector, termed LaMI-DETR. LaMI utilizes GPT to construct visual concepts and employs T5 to investigate visual similarities across categories. These inter-category relationships refine concept representation and avoid overfitting to base categories. Comprehensive experiments validate our approach's superior performance over existing methods in the same rigorous setting without reliance on external training resources. LaMI-DETR achieves a rare box AP of 43.4 on OV-LVIS, surpassing the previous best by 7.8 rare box AP.
引用
收藏
页码:312 / 328
页数:17
相关论文
共 50 条
  • [31] MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection
    Wang, Kuo
    Cheng, Lechao
    Chen, Weikai
    Zhang, Pingping
    Lin, Liang
    Zhou, Fan
    Li, Guanbin
    COMPUTER VISION - ECCV 2024, PT XVII, 2025, 15075 : 106 - 122
  • [32] A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-Language Model
    Xu, Mengde
    Zhang, Zheng
    Wei, Fangyun
    Lin, Yutong
    Cao, Yue
    Hu, Han
    Bai, Xiang
    COMPUTER VISION, ECCV 2022, PT XXIX, 2022, 13689 : 736 - 753
  • [33] Open-Vocabulary Affordance Detection in 3D Point Clouds
    Toan Nguyen
    Minh Nhat Vu
    An Vuong
    Dzung Nguyen
    Thieu Vo
    Ngan Le
    Anh Nguyen
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 5692 - 5698
  • [34] EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment
    Shi, Cheng
    Yang, Sibei
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15678 - 15688
  • [35] OV-DAR: Open-Vocabulary Object Detection and Attributes Recognition
    Chen, Keyan
    Jiang, Xiaolong
    Wang, Haochen
    Yan, Cilin
    Gao, Yan
    Tang, Xu
    Hu, Yao
    Xie, Weidi
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (11) : 5387 - 5409
  • [36] Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
    Wang, Luting
    Liu, Yi
    Du, Penghui
    Ding, Zihan
    Liao, Yue
    Qi, Qiaosong
    Chen, Biaolong
    Liu, Si
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11186 - 11196
  • [37] Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision
    Xu, Jilan
    Hou, Junlin
    Zhang, Yuejie
    Feng, Rui
    Wang, Yi
    Qiao, Yu
    Xie, Weidi
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2935 - 2944
  • [38] Data Augmentation by Data Noising for Open-vocabulary Slots in Spoken Language Understanding
    Kim, Hwa-Yeon
    Roh, Yoon-Hyung
    Kim, Young-Kil
    NAACL HLT 2019: THE 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2019, : 97 - 102
  • [39] Open-Vocabulary Animal Keypoint Detection with Semantic-Feature Matching
    Zhang, Hao
    Xu, Lumin
    Lai, Shenqi
    Shao, Wenqi
    Zheng, Nanning
    Luo, Ping
    Qiao, Yu
    Zhang, Kaipeng
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (12) : 5741 - 5758
  • [40] Predicting detection filters for small footprint open-vocabulary keyword spotting
    Bluche, Theodore
    Gisselbrecht, Thibault
    INTERSPEECH 2020, 2020, : 2552 - 2556