Multi-modal Prompts with Feature Decoupling for Open-Vocabulary Object Detection

被引:0
|
作者
Wang, Duorui [1 ]
Zhao, Xiaowei [1 ]
机构
[1] Beihang Univ, State Key Lab Complex & Crit Software Environm, Beijing 100191, Peoples R China
关键词
feature decoupling; multi-modal prompts; open-vocabulary object detection; region expansion;
D O I
10.1007/978-981-97-6125-8_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Open-vocabulary object detection aims to acquire the ability to recognize novel categories through text description using data of limited categories for training. The Prompt serves as a template to assist in the construction of textual descriptions for categories. With the development of open-vocabulary object detection, multi-modal prompts with better performance have emerged. However, existing multi-modal prompts fail to align the context and object components across different modalities during the construction. To address the issue, we propose an open-vocabulary object detection framework based on multi-modal prompts with feature decoupling. The framework consists of two modules, the construction of Multi-modal Prompts with Feature Decoupling (MPFD) and the visual Region Expansion (RE). During prompts constructing, the MPFD decouples the object and context components from the visual embeddings and then performs multi-modal fusion with the corresponding parts of the text embeddings respectively. The RE incorporates additional context information into the visual embeddings to enhance the discriminative ability of the prompts. Sufficient experiments have demonstrated that feature decoupling multi-modal prompts can effectively improve the performance of open-vocabulary object detection models.
引用
收藏
页码:180 / 194
页数:15
相关论文
共 50 条
  • [31] Multi-Modal Feature Pyramid Transformer for RGB-Infrared Object Detection
    Zhu, Yaohui
    Sun, Xiaoyu
    Wang, Miao
    Huang, Hua
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (09) : 9984 - 9995
  • [32] Unsupervised Open-Vocabulary Object Localization in Videos
    Fan, Ke
    Bai, Zechen
    Xiao, Tianjun
    Zietlow, Dominik
    Horn, Max
    Zhao, Zixu
    Simon-Gabriel, Carl-Johann
    Shou, Mike Zheng
    Locatello, Francesco
    Schiele, Bernt
    Brox, Thomas
    Zhang, Zheng
    Fu, Yanwei
    He, Tong
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13701 - 13709
  • [33] OVTrack: Open-Vocabulary Multiple Object Tracking
    Li, Siyuan
    Fischer, Tobias
    Ke, Lei
    Ding, Henghui
    Danelljan, Martin
    Yu, Fisher
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 5567 - 5577
  • [34] Multi-modal Queried Object Detection in the Wild
    Xu, Yifan
    Zhang, Mengdan
    Fu, Chaoyou
    Chen, Peixian
    Yang, Xiaoshan
    Li, Ke
    Xu, Changsheng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [35] Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model
    Du, Yu
    Wei, Fangyun
    Zhang, Zihe
    Shi, Miaojing
    Gao, Yue
    Li, Guoqi
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 14064 - 14073
  • [36] Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers
    Kim, Dahun
    Angelova, Anelia
    Kuo, Weicheng
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11144 - 11154
  • [37] YOLO-World: Real-Time Open-Vocabulary Object Detection
    Cheng, Tianheng
    Sone, Lin
    Ge, Yixiao
    Liu, Wenyu
    Wang, Xinggang
    Shan, Yong
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 16901 - 16911
  • [38] Open-vocabulary object detection via debiased curriculum self-training
    Zhang, Hanlue
    Guan, Dayan
    Ke, Xiangrui
    El Saddik, Abdulmotaleb
    Lu, Shijian
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
  • [39] Simple Image-Level Classification Improves Open-Vocabulary Object Detection
    Fang, Ruohuan
    Pang, Guansong
    Bai, Xiao
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1716 - 1725
  • [40] DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
    Yao, Lewei
    Pi, Renjie
    Hang, Jianhua
    Liang, Xiaodan
    Xu, Hang
    Zhang, Wei
    Li, Zhenguo
    Xu, Dan
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 27381 - 27391