Multi-modal Prompts with Feature Decoupling for Open-Vocabulary Object Detection

被引：0

作者：

Wang, Duorui ^{[1
]}

Zhao, Xiaowei ^{[1
]}

机构：

[1] Beihang Univ, State Key Lab Complex & Crit Software Environm, Beijing 100191, Peoples R China

来源：

GENERALIZING FROM LIMITED RESOURCES IN THE OPEN WORLD, GLOW-IJCAI 2024 | 2024年 / 2160卷

关键词：

feature decoupling; multi-modal prompts; open-vocabulary object detection; region expansion;

D O I：

10.1007/978-981-97-6125-8_14

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Open-vocabulary object detection aims to acquire the ability to recognize novel categories through text description using data of limited categories for training. The Prompt serves as a template to assist in the construction of textual descriptions for categories. With the development of open-vocabulary object detection, multi-modal prompts with better performance have emerged. However, existing multi-modal prompts fail to align the context and object components across different modalities during the construction. To address the issue, we propose an open-vocabulary object detection framework based on multi-modal prompts with feature decoupling. The framework consists of two modules, the construction of Multi-modal Prompts with Feature Decoupling (MPFD) and the visual Region Expansion (RE). During prompts constructing, the MPFD decouples the object and context components from the visual embeddings and then performs multi-modal fusion with the corresponding parts of the text embeddings respectively. The RE incorporates additional context information into the visual embeddings to enhance the discriminative ability of the prompts. Sufficient experiments have demonstrated that feature decoupling multi-modal prompts can effectively improve the performance of open-vocabulary object detection models.

引用

页码：180 / 194

页数：15

共 50 条

[1] Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection
Xu, Yifan
Zhang, Mengdan
Yang, Xiaoshan
Xu, Changsheng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 6253 - 6267
[2] Multi-Modal Prompting for Open-Vocabulary Video Visual Relationship Detection
Yang, Shuo
Wang, Yongqi
Ji, Xiaofeng
Wu, Xinxiao
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 6513 - 6521
[3] Open-Vocabulary Object Detection With an Open Corpus
Wang, Jiong
Zhang, Huiming
Hong, Haiwen
Jin, Xuan
He, Yuan
Xue, Hui
Zhao, Zhou
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6736 - 6746
[4] Scaling Open-Vocabulary Object Detection
Minderer, Matthias
Gritsenko, Alexey
Houlsby, Neil
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[5] Simple Open-Vocabulary Object Detection
Minderer, Matthias
Gritsenko, Alexey
Stone, Austin
Neumann, Maxim
Weissenborn, Dirk
Dosovitskiy, Alexey
Mahendran, Aravindh
Arnab, Anurag
Dehghani, Mostafa
Shen, Zhuoran
Wang, Xiao
Zhai, Xiaohua
Kipf, Thomas
Houlsby, Neil
COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 728 - 755
[6] Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer
He, Sunan
Guo, Taian
Dai, Tao
Qiao, Ruizhi
Shu, Xiujun
Ren, Bo
Xia, Shu-Tao
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 808 - 816
[7] Open-World Human-Object Interaction Detection via Multi-modal Prompts
Yang, Jie
Li, Bingliang
Zeng, Ailing
Zhang, Lei
Zhang, Ruimao
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 16954 - 16964
[8] Cross-Modal Collaboration and Robust Feature Classifier for Open-Vocabulary 3D Object Detection
Liu, Hengsong
Duan, Tongle
SENSORS, 2025, 25 (02)
[9] Open-Vocabulary Object Detection Using Captions
Zareian, Alireza
Dela Rosa, Kevin
Hu, Derek Hao
Chang, Shih-Fu
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 14388 - 14397
[10] Weakly Supervised Open-Vocabulary Object Detection
Lin, Jianghang
Shen, Yunhang
Wang, Bingquan
Lin, Shaohui
Li, Ke
Cao, Liujuan
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3404 - 3412

← 1 2 3 4 5 →