Multi-modal Prompts with Feature Decoupling for Open-Vocabulary Object Detection

被引：0

作者：

Wang, Duorui ^{[1
]}

Zhao, Xiaowei ^{[1
]}

机构：

[1] Beihang Univ, State Key Lab Complex & Crit Software Environm, Beijing 100191, Peoples R China

来源：

GENERALIZING FROM LIMITED RESOURCES IN THE OPEN WORLD, GLOW-IJCAI 2024 | 2024年 / 2160卷

关键词：

feature decoupling; multi-modal prompts; open-vocabulary object detection; region expansion;

D O I：

10.1007/978-981-97-6125-8_14

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Open-vocabulary object detection aims to acquire the ability to recognize novel categories through text description using data of limited categories for training. The Prompt serves as a template to assist in the construction of textual descriptions for categories. With the development of open-vocabulary object detection, multi-modal prompts with better performance have emerged. However, existing multi-modal prompts fail to align the context and object components across different modalities during the construction. To address the issue, we propose an open-vocabulary object detection framework based on multi-modal prompts with feature decoupling. The framework consists of two modules, the construction of Multi-modal Prompts with Feature Decoupling (MPFD) and the visual Region Expansion (RE). During prompts constructing, the MPFD decouples the object and context components from the visual embeddings and then performs multi-modal fusion with the corresponding parts of the text embeddings respectively. The RE incorporates additional context information into the visual embeddings to enhance the discriminative ability of the prompts. Sufficient experiments have demonstrated that feature decoupling multi-modal prompts can effectively improve the performance of open-vocabulary object detection models.

引用

页码：180 / 194

页数：15

共 50 条

[21] CLIFF: Continual Latent Diffusion for Open-Vocabulary Object Detection
Li, Wuyang
Liu, Xinyu
Ma, Jiayi
Yuan, Yixuan
COMPUTER VISION - ECCV 2024, PT LV, 2025, 15113 : 255 - 273
[22] MULTI-MODAL FEATURE FUSION NETWORK FOR GHOST IMAGING OBJECT DETECTION
Hu, Nan
Ma, Huimin
Le, Chao
Shao, Xuehui
2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 351 - 355
[23] Open-vocabulary Attribute Detection
Bravo, Maria A.
Mittal, Sudhanshu
Ging, Simon
Brox, Thomas
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7041 - 7050
[24] EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment
Shi, Cheng
Yang, Sibei
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15678 - 15688
[25] OV-DAR: Open-Vocabulary Object Detection and Attributes Recognition
Chen, Keyan
Jiang, Xiaolong
Wang, Haochen
Yan, Cilin
Gao, Yan
Tang, Xu
Hu, Yao
Xie, Weidi
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (11) : 5387 - 5409
[26] Adversarial Semantic Decoupling for Recognizing Open-Vocabulary Slots
Yan, Yuanmeng
He, Keqing
Xu, Hong
Liu, Sihong
Meng, Fanyu
Hu, Min
Xu, Weiran
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 6070 - 6075
[27] Localized Vision-Language Matching for Open-vocabulary Object Detection
Bravo, Maria A.
Mittal, Sudhanshu
Brox, Thomas
PATTERN RECOGNITION, DAGM GCPR 2022, 2022, 13485 : 393 - 408
[28] Open-Vocabulary Animal Keypoint Detection with Semantic-Feature Matching
Zhang, Hao
Xu, Lumin
Lai, Shenqi
Shao, Wenqi
Zheng, Nanning
Luo, Ping
Qiao, Yu
Zhang, Kaipeng
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (12) : 5741 - 5758
[29] Multi-modal deep feature learning for RGB-D object detection
Xu, Xiangyang
Li, Yuncheng
Wu, Gangshan
Luo, Jiebo
PATTERN RECOGNITION, 2017, 72 : 300 - 313
[30] Open-vocabulary Object Segmentation with Diffusion Models
Li, Ziyi
Zhou, Qinye
Zhang, Xiaoyun
Zhang, Ya
Wang, Yanfeng
Xie, Weidi
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7633 - 7642

← 1 2 3 4 5 →