Multi-modal Prompts with Feature Decoupling for Open-Vocabulary Object Detection

被引:0
|
作者
Wang, Duorui [1 ]
Zhao, Xiaowei [1 ]
机构
[1] Beihang Univ, State Key Lab Complex & Crit Software Environm, Beijing 100191, Peoples R China
关键词
feature decoupling; multi-modal prompts; open-vocabulary object detection; region expansion;
D O I
10.1007/978-981-97-6125-8_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Open-vocabulary object detection aims to acquire the ability to recognize novel categories through text description using data of limited categories for training. The Prompt serves as a template to assist in the construction of textual descriptions for categories. With the development of open-vocabulary object detection, multi-modal prompts with better performance have emerged. However, existing multi-modal prompts fail to align the context and object components across different modalities during the construction. To address the issue, we propose an open-vocabulary object detection framework based on multi-modal prompts with feature decoupling. The framework consists of two modules, the construction of Multi-modal Prompts with Feature Decoupling (MPFD) and the visual Region Expansion (RE). During prompts constructing, the MPFD decouples the object and context components from the visual embeddings and then performs multi-modal fusion with the corresponding parts of the text embeddings respectively. The RE incorporates additional context information into the visual embeddings to enhance the discriminative ability of the prompts. Sufficient experiments have demonstrated that feature decoupling multi-modal prompts can effectively improve the performance of open-vocabulary object detection models.
引用
收藏
页码:180 / 194
页数:15
相关论文
共 50 条
  • [11] Open-Vocabulary Object Detection by Novel-Class Feature Perception Enhancement
    Hui, Kanghua
    Cai, Xianqiao
    Zhang, Zhi
    Huang, Rui
    Liu, Qing
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IV, ICIC 2024, 2024, 14865 : 220 - 231
  • [12] Aligning Bag of Regions for Open-Vocabulary Object Detection
    Wu, Size
    Zhang, Wenwei
    Jin, Sheng
    Liu, Wentao
    Loy, Chen Change
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15254 - 15264
  • [13] Scene-adaptive and Region-aware Multi-modal Prompt for Open Vocabulary Object Detection
    Zhao, Xiaowei
    Liu, Xianglong
    Wang, Duorui
    Gao, Yajun
    Liu, Zhide
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 16741 - 16750
  • [14] Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
    Wang, Luting
    Liu, Yi
    Du, Penghui
    Ding, Zihan
    Liao, Yue
    Qi, Qiaosong
    Chen, Biaolong
    Liu, Si
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11186 - 11196
  • [15] Understanding object descriptions in robotics by open-vocabulary object retrieval and detection
    Guadarrama, Sergio
    Rodner, Erik
    Saenko, Kate
    Darrell, Trevor
    INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2016, 35 (1-3): : 265 - 280
  • [16] Federated fine-grained prompts for vision-language models based on open-vocabulary object detection
    Li, Yu
    APPLIED INTELLIGENCE, 2025, 55 (07)
  • [17] Utilising SkyScript for Open-Vocabulary Categorization, Extraction, and Captioning to Enhance Multi-Modal Tasks in Remote Sensing
    Saranya Nagaraj
    Shanmuga Priya Sivakumar
    Lawrence Sherly Puspha Annabel
    Vilas Ramrao Joshi
    Mithun Baswaraj Patil
    Vishal Ratansing Patil
    Remote Sensing in Earth Systems Sciences, 2024, 7 (3) : 149 - 158
  • [18] SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection
    Liu, Mingxuan
    Hayes, Tyler L.
    Ricci, Elisa
    Csurka, Gabriela
    Volpi, Riccardo
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 16634 - 16644
  • [19] Open-Vocabulary Object Detection via Scene Graph Discovery
    Shi, Hengcan
    Hayat, Munawar
    Cai, Jianfei
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4012 - 4021
  • [20] Open-Vocabulary Camouflaged Object Segmentation
    Pang, Youwei
    Zhao, Xiaoqi
    Zuo, Jiaming
    Zhang, Lihe
    Lu, Huchuan
    COMPUTER VISION - ECCV 2024, PT XLVII, 2025, 15105 : 476 - 495