Multi-modal Prompts with Feature Decoupling for Open-Vocabulary Object Detection

被引:0
|
作者
Wang, Duorui [1 ]
Zhao, Xiaowei [1 ]
机构
[1] Beihang Univ, State Key Lab Complex & Crit Software Environm, Beijing 100191, Peoples R China
关键词
feature decoupling; multi-modal prompts; open-vocabulary object detection; region expansion;
D O I
10.1007/978-981-97-6125-8_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Open-vocabulary object detection aims to acquire the ability to recognize novel categories through text description using data of limited categories for training. The Prompt serves as a template to assist in the construction of textual descriptions for categories. With the development of open-vocabulary object detection, multi-modal prompts with better performance have emerged. However, existing multi-modal prompts fail to align the context and object components across different modalities during the construction. To address the issue, we propose an open-vocabulary object detection framework based on multi-modal prompts with feature decoupling. The framework consists of two modules, the construction of Multi-modal Prompts with Feature Decoupling (MPFD) and the visual Region Expansion (RE). During prompts constructing, the MPFD decouples the object and context components from the visual embeddings and then performs multi-modal fusion with the corresponding parts of the text embeddings respectively. The RE incorporates additional context information into the visual embeddings to enhance the discriminative ability of the prompts. Sufficient experiments have demonstrated that feature decoupling multi-modal prompts can effectively improve the performance of open-vocabulary object detection models.
引用
收藏
页码:180 / 194
页数:15
相关论文
共 50 条
  • [1] Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection
    Xu, Yifan
    Zhang, Mengdan
    Yang, Xiaoshan
    Xu, Changsheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 6253 - 6267
  • [2] Multi-Modal Prompting for Open-Vocabulary Video Visual Relationship Detection
    Yang, Shuo
    Wang, Yongqi
    Ji, Xiaofeng
    Wu, Xinxiao
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 6513 - 6521
  • [3] Open-Vocabulary Object Detection With an Open Corpus
    Wang, Jiong
    Zhang, Huiming
    Hong, Haiwen
    Jin, Xuan
    He, Yuan
    Xue, Hui
    Zhao, Zhou
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6736 - 6746
  • [4] Scaling Open-Vocabulary Object Detection
    Minderer, Matthias
    Gritsenko, Alexey
    Houlsby, Neil
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] Simple Open-Vocabulary Object Detection
    Minderer, Matthias
    Gritsenko, Alexey
    Stone, Austin
    Neumann, Maxim
    Weissenborn, Dirk
    Dosovitskiy, Alexey
    Mahendran, Aravindh
    Arnab, Anurag
    Dehghani, Mostafa
    Shen, Zhuoran
    Wang, Xiao
    Zhai, Xiaohua
    Kipf, Thomas
    Houlsby, Neil
    COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 728 - 755
  • [6] Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer
    He, Sunan
    Guo, Taian
    Dai, Tao
    Qiao, Ruizhi
    Shu, Xiujun
    Ren, Bo
    Xia, Shu-Tao
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 808 - 816
  • [7] Open-World Human-Object Interaction Detection via Multi-modal Prompts
    Yang, Jie
    Li, Bingliang
    Zeng, Ailing
    Zhang, Lei
    Zhang, Ruimao
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 16954 - 16964
  • [8] Cross-Modal Collaboration and Robust Feature Classifier for Open-Vocabulary 3D Object Detection
    Liu, Hengsong
    Duan, Tongle
    SENSORS, 2025, 25 (02)
  • [9] Open-Vocabulary Object Detection Using Captions
    Zareian, Alireza
    Dela Rosa, Kevin
    Hu, Derek Hao
    Chang, Shih-Fu
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 14388 - 14397
  • [10] Weakly Supervised Open-Vocabulary Object Detection
    Lin, Jianghang
    Shen, Yunhang
    Wang, Bingquan
    Lin, Shaohui
    Li, Ke
    Cao, Liujuan
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3404 - 3412