LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction

被引:0
|
作者
Du, Penghui [1 ,2 ,3 ]
Wang, Yu [2 ]
Sung, Yifan [2 ]
Wang, Luting [1 ]
Li, Yue [1 ]
Zhang, Gang [2 ]
Ding, Errui [2 ]
Wang, Yan [3 ]
Wang, Jingdong [2 ]
Liu, Si [1 ]
机构
[1] Beihang Univ, Beijing, Peoples R China
[2] Baidu, Beijing, Peoples R China
[3] Tsinghua Univ, AIR, Beijing, Peoples R China
来源
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
Inter-category Relationships; Language Model; DETR;
D O I
10.1007/978-3-031-73337-6_18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing methods enhance open-vocabulary object detection by leveraging the robust open-vocabulary recognition capabilities of Vision-Language Models (VLMs), such as CLIP. However, two main challenges emerge: (1) A deficiency in concept representation, where the category names in CLIP's text space lack textual and visual knowledge. (2) An overfitting tendency towards base categories, with the open vocabulary knowledge biased towards base categories during the transfer from VLMs to detectors. To address these challenges, we propose the Language Model Instruction (LaMI) strategy, which leverages the relationships between visual concepts and applies them within a simple yet effective DETR-like detector, termed LaMI-DETR. LaMI utilizes GPT to construct visual concepts and employs T5 to investigate visual similarities across categories. These inter-category relationships refine concept representation and avoid overfitting to base categories. Comprehensive experiments validate our approach's superior performance over existing methods in the same rigorous setting without reliance on external training resources. LaMI-DETR achieves a rare box AP of 43.4 on OV-LVIS, surpassing the previous best by 7.8 rare box AP.
引用
收藏
页码:312 / 328
页数:17
相关论文
共 50 条
  • [1] Open-Vocabulary DETR with Conditional Matching
    Zang, Yuhang
    Li, Wei
    Zhou, Kaiyang
    Huang, Chen
    Loy, Chen Change
    COMPUTER VISION, ECCV 2022, PT IX, 2022, 13669 : 106 - 122
  • [2] Distilling DETR with Visual-Linguistic Knowledge for Open-Vocabulary Object Detection
    Li, Liangqi
    Miao, Jiaxu
    Shi, Dahu
    Tan, Wenming
    Ren, Ye
    Yang, Yi
    Pu, Shiliang
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6478 - 6487
  • [3] Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model
    Du, Yu
    Wei, Fangyun
    Zhang, Zihe
    Shi, Miaojing
    Gao, Yue
    Li, Guoqi
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 14064 - 14073
  • [4] A Hybrid Language Model for Open-Vocabulary Thai LVCSR
    Thangthai, Kwanchiva
    Chotimongkol, Ananlada
    Wutiwiwatchai, Chai
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2206 - 2210
  • [5] Open-vocabulary Attribute Detection
    Bravo, Maria A.
    Mittal, Sudhanshu
    Ging, Simon
    Brox, Thomas
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7041 - 7050
  • [6] LLMFormer: Large Language Model for Open-Vocabulary Semantic Segmentation
    Shi, Hengcan
    Dao, Son Duy
    Cai, Jianfei
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (02) : 742 - 759
  • [7] Open-Vocabulary Object Detection With an Open Corpus
    Wang, Jiong
    Zhang, Huiming
    Hong, Haiwen
    Jin, Xuan
    He, Yuan
    Xue, Hui
    Zhao, Zhou
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6736 - 6746
  • [8] Prompt-guided DETR with RoI-pruned masked attention for open-vocabulary object detection
    Song, Hwanjun
    Bang, Jihwan
    PATTERN RECOGNITION, 2024, 155
  • [9] Localized Vision-Language Matching for Open-vocabulary Object Detection
    Bravo, Maria A.
    Mittal, Sudhanshu
    Brox, Thomas
    PATTERN RECOGNITION, DAGM GCPR 2022, 2022, 13485 : 393 - 408
  • [10] Scaling Open-Vocabulary Object Detection
    Minderer, Matthias
    Gritsenko, Alexey
    Houlsby, Neil
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,