Multi-modal Prompts with Feature Decoupling for Open-Vocabulary Object Detection

被引：0

作者：

Wang, Duorui ^{[1
]}

Zhao, Xiaowei ^{[1
]}

机构：

[1] Beihang Univ, State Key Lab Complex & Crit Software Environm, Beijing 100191, Peoples R China

来源：

GENERALIZING FROM LIMITED RESOURCES IN THE OPEN WORLD, GLOW-IJCAI 2024 | 2024年 / 2160卷

关键词：

feature decoupling; multi-modal prompts; open-vocabulary object detection; region expansion;

D O I：

10.1007/978-981-97-6125-8_14

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Open-vocabulary object detection aims to acquire the ability to recognize novel categories through text description using data of limited categories for training. The Prompt serves as a template to assist in the construction of textual descriptions for categories. With the development of open-vocabulary object detection, multi-modal prompts with better performance have emerged. However, existing multi-modal prompts fail to align the context and object components across different modalities during the construction. To address the issue, we propose an open-vocabulary object detection framework based on multi-modal prompts with feature decoupling. The framework consists of two modules, the construction of Multi-modal Prompts with Feature Decoupling (MPFD) and the visual Region Expansion (RE). During prompts constructing, the MPFD decouples the object and context components from the visual embeddings and then performs multi-modal fusion with the corresponding parts of the text embeddings respectively. The RE incorporates additional context information into the visual embeddings to enhance the discriminative ability of the prompts. Sufficient experiments have demonstrated that feature decoupling multi-modal prompts can effectively improve the performance of open-vocabulary object detection models.

引用

页码：180 / 194

页数：15

共 50 条

[31] Multi-Modal Feature Pyramid Transformer for RGB-Infrared Object Detection
Zhu, Yaohui
Sun, Xiaoyu
Wang, Miao
Huang, Hua
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (09) : 9984 - 9995
[32] Unsupervised Open-Vocabulary Object Localization in Videos
Fan, Ke
Bai, Zechen
Xiao, Tianjun
Zietlow, Dominik
Horn, Max
Zhao, Zixu
Simon-Gabriel, Carl-Johann
Shou, Mike Zheng
Locatello, Francesco
Schiele, Bernt
Brox, Thomas
Zhang, Zheng
Fu, Yanwei
He, Tong
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13701 - 13709
[33] OVTrack: Open-Vocabulary Multiple Object Tracking
Li, Siyuan
Fischer, Tobias
Ke, Lei
Ding, Henghui
Danelljan, Martin
Yu, Fisher
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 5567 - 5577
[34] Multi-modal Queried Object Detection in the Wild
Xu, Yifan
Zhang, Mengdan
Fu, Chaoyou
Chen, Peixian
Yang, Xiaoshan
Li, Ke
Xu, Changsheng
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[35] Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model
Du, Yu
Wei, Fangyun
Zhang, Zihe
Shi, Miaojing
Gao, Yue
Li, Guoqi
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 14064 - 14073
[36] Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers
Kim, Dahun
Angelova, Anelia
Kuo, Weicheng
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11144 - 11154
[37] YOLO-World: Real-Time Open-Vocabulary Object Detection
Cheng, Tianheng
Sone, Lin
Ge, Yixiao
Liu, Wenyu
Wang, Xinggang
Shan, Yong
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 16901 - 16911
[38] Open-vocabulary object detection via debiased curriculum self-training
Zhang, Hanlue
Guan, Dayan
Ke, Xiangrui
El Saddik, Abdulmotaleb
Lu, Shijian
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
[39] Simple Image-Level Classification Improves Open-Vocabulary Object Detection
Fang, Ruohuan
Pang, Guansong
Bai, Xiao
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1716 - 1725
[40] DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
Yao, Lewei
Pi, Renjie
Hang, Jianhua
Liang, Xiaodan
Xu, Hang
Zhang, Wei
Li, Zhenguo
Xu, Dan
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 27381 - 27391

← 1 2 3 4 5 →