Multi-modal Prompts with Feature Decoupling for Open-Vocabulary Object Detection

被引:0
|
作者
Wang, Duorui [1 ]
Zhao, Xiaowei [1 ]
机构
[1] Beihang Univ, State Key Lab Complex & Crit Software Environm, Beijing 100191, Peoples R China
关键词
feature decoupling; multi-modal prompts; open-vocabulary object detection; region expansion;
D O I
10.1007/978-981-97-6125-8_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Open-vocabulary object detection aims to acquire the ability to recognize novel categories through text description using data of limited categories for training. The Prompt serves as a template to assist in the construction of textual descriptions for categories. With the development of open-vocabulary object detection, multi-modal prompts with better performance have emerged. However, existing multi-modal prompts fail to align the context and object components across different modalities during the construction. To address the issue, we propose an open-vocabulary object detection framework based on multi-modal prompts with feature decoupling. The framework consists of two modules, the construction of Multi-modal Prompts with Feature Decoupling (MPFD) and the visual Region Expansion (RE). During prompts constructing, the MPFD decouples the object and context components from the visual embeddings and then performs multi-modal fusion with the corresponding parts of the text embeddings respectively. The RE incorporates additional context information into the visual embeddings to enhance the discriminative ability of the prompts. Sufficient experiments have demonstrated that feature decoupling multi-modal prompts can effectively improve the performance of open-vocabulary object detection models.
引用
收藏
页码:180 / 194
页数:15
相关论文
共 50 条
  • [21] CLIFF: Continual Latent Diffusion for Open-Vocabulary Object Detection
    Li, Wuyang
    Liu, Xinyu
    Ma, Jiayi
    Yuan, Yixuan
    COMPUTER VISION - ECCV 2024, PT LV, 2025, 15113 : 255 - 273
  • [22] MULTI-MODAL FEATURE FUSION NETWORK FOR GHOST IMAGING OBJECT DETECTION
    Hu, Nan
    Ma, Huimin
    Le, Chao
    Shao, Xuehui
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 351 - 355
  • [23] Open-vocabulary Attribute Detection
    Bravo, Maria A.
    Mittal, Sudhanshu
    Ging, Simon
    Brox, Thomas
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7041 - 7050
  • [24] EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment
    Shi, Cheng
    Yang, Sibei
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15678 - 15688
  • [25] OV-DAR: Open-Vocabulary Object Detection and Attributes Recognition
    Chen, Keyan
    Jiang, Xiaolong
    Wang, Haochen
    Yan, Cilin
    Gao, Yan
    Tang, Xu
    Hu, Yao
    Xie, Weidi
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (11) : 5387 - 5409
  • [26] Adversarial Semantic Decoupling for Recognizing Open-Vocabulary Slots
    Yan, Yuanmeng
    He, Keqing
    Xu, Hong
    Liu, Sihong
    Meng, Fanyu
    Hu, Min
    Xu, Weiran
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 6070 - 6075
  • [27] Localized Vision-Language Matching for Open-vocabulary Object Detection
    Bravo, Maria A.
    Mittal, Sudhanshu
    Brox, Thomas
    PATTERN RECOGNITION, DAGM GCPR 2022, 2022, 13485 : 393 - 408
  • [28] Open-Vocabulary Animal Keypoint Detection with Semantic-Feature Matching
    Zhang, Hao
    Xu, Lumin
    Lai, Shenqi
    Shao, Wenqi
    Zheng, Nanning
    Luo, Ping
    Qiao, Yu
    Zhang, Kaipeng
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (12) : 5741 - 5758
  • [29] Multi-modal deep feature learning for RGB-D object detection
    Xu, Xiangyang
    Li, Yuncheng
    Wu, Gangshan
    Luo, Jiebo
    PATTERN RECOGNITION, 2017, 72 : 300 - 313
  • [30] Open-vocabulary Object Segmentation with Diffusion Models
    Li, Ziyi
    Zhou, Qinye
    Zhang, Xiaoyun
    Zhang, Ya
    Wang, Yanfeng
    Xie, Weidi
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7633 - 7642