Multi-modal Prompts with Feature Decoupling for Open-Vocabulary Object Detection

被引：0

作者：

Wang, Duorui ^{[1
]}

Zhao, Xiaowei ^{[1
]}

机构：

[1] Beihang Univ, State Key Lab Complex & Crit Software Environm, Beijing 100191, Peoples R China

来源：

GENERALIZING FROM LIMITED RESOURCES IN THE OPEN WORLD, GLOW-IJCAI 2024 | 2024年 / 2160卷

关键词：

feature decoupling; multi-modal prompts; open-vocabulary object detection; region expansion;

D O I：

10.1007/978-981-97-6125-8_14

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Open-vocabulary object detection aims to acquire the ability to recognize novel categories through text description using data of limited categories for training. The Prompt serves as a template to assist in the construction of textual descriptions for categories. With the development of open-vocabulary object detection, multi-modal prompts with better performance have emerged. However, existing multi-modal prompts fail to align the context and object components across different modalities during the construction. To address the issue, we propose an open-vocabulary object detection framework based on multi-modal prompts with feature decoupling. The framework consists of two modules, the construction of Multi-modal Prompts with Feature Decoupling (MPFD) and the visual Region Expansion (RE). During prompts constructing, the MPFD decouples the object and context components from the visual embeddings and then performs multi-modal fusion with the corresponding parts of the text embeddings respectively. The RE incorporates additional context information into the visual embeddings to enhance the discriminative ability of the prompts. Sufficient experiments have demonstrated that feature decoupling multi-modal prompts can effectively improve the performance of open-vocabulary object detection models.

引用

页码：180 / 194

页数：15

共 50 条

[11] Open-Vocabulary Object Detection by Novel-Class Feature Perception Enhancement
Hui, Kanghua
Cai, Xianqiao
Zhang, Zhi
Huang, Rui
Liu, Qing
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IV, ICIC 2024, 2024, 14865 : 220 - 231
[12] Aligning Bag of Regions for Open-Vocabulary Object Detection
Wu, Size
Zhang, Wenwei
Jin, Sheng
Liu, Wentao
Loy, Chen Change
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15254 - 15264
[13] Scene-adaptive and Region-aware Multi-modal Prompt for Open Vocabulary Object Detection
Zhao, Xiaowei
Liu, Xianglong
Wang, Duorui
Gao, Yajun
Liu, Zhide
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 16741 - 16750
[14] Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
Wang, Luting
Liu, Yi
Du, Penghui
Ding, Zihan
Liao, Yue
Qi, Qiaosong
Chen, Biaolong
Liu, Si
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11186 - 11196
[15] Understanding object descriptions in robotics by open-vocabulary object retrieval and detection
Guadarrama, Sergio
Rodner, Erik
Saenko, Kate
Darrell, Trevor
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2016, 35 (1-3): : 265 - 280
[16] Federated fine-grained prompts for vision-language models based on open-vocabulary object detection
Li, Yu
APPLIED INTELLIGENCE, 2025, 55 (07)
[17] Utilising SkyScript for Open-Vocabulary Categorization, Extraction, and Captioning to Enhance Multi-Modal Tasks in Remote Sensing
Saranya Nagaraj
Shanmuga Priya Sivakumar
Lawrence Sherly Puspha Annabel
Vilas Ramrao Joshi
Mithun Baswaraj Patil
Vishal Ratansing Patil
Remote Sensing in Earth Systems Sciences, 2024, 7 (3) : 149 - 158
[18] SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection
Liu, Mingxuan
Hayes, Tyler L.
Ricci, Elisa
Csurka, Gabriela
Volpi, Riccardo
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 16634 - 16644
[19] Open-Vocabulary Object Detection via Scene Graph Discovery
Shi, Hengcan
Hayat, Munawar
Cai, Jianfei
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4012 - 4021
[20] Open-Vocabulary Camouflaged Object Segmentation
Pang, Youwei
Zhao, Xiaoqi
Zuo, Jiaming
Zhang, Lihe
Lu, Huchuan
COMPUTER VISION - ECCV 2024, PT XLVII, 2025, 15105 : 476 - 495

← 1 2 3 4 5 →