Multi-modal Prompts with Feature Decoupling for Open-Vocabulary Object Detection

被引:0
|
作者
Wang, Duorui [1 ]
Zhao, Xiaowei [1 ]
机构
[1] Beihang Univ, State Key Lab Complex & Crit Software Environm, Beijing 100191, Peoples R China
关键词
feature decoupling; multi-modal prompts; open-vocabulary object detection; region expansion;
D O I
10.1007/978-981-97-6125-8_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Open-vocabulary object detection aims to acquire the ability to recognize novel categories through text description using data of limited categories for training. The Prompt serves as a template to assist in the construction of textual descriptions for categories. With the development of open-vocabulary object detection, multi-modal prompts with better performance have emerged. However, existing multi-modal prompts fail to align the context and object components across different modalities during the construction. To address the issue, we propose an open-vocabulary object detection framework based on multi-modal prompts with feature decoupling. The framework consists of two modules, the construction of Multi-modal Prompts with Feature Decoupling (MPFD) and the visual Region Expansion (RE). During prompts constructing, the MPFD decouples the object and context components from the visual embeddings and then performs multi-modal fusion with the corresponding parts of the text embeddings respectively. The RE incorporates additional context information into the visual embeddings to enhance the discriminative ability of the prompts. Sufficient experiments have demonstrated that feature decoupling multi-modal prompts can effectively improve the performance of open-vocabulary object detection models.
引用
收藏
页码:180 / 194
页数:15
相关论文
共 50 条
  • [41] OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection
    Zhang, Hu
    Ku, Jianhua
    Tang, Tao
    Sun, Haiyang
    Huang, Xin
    Huang, Zi
    Yu, Kaicheng
    COMPUTER VISION - ECCV 2024, PT LXXXIV, 2025, 15142 : 1 - 19
  • [42] Distilling DETR with Visual-Linguistic Knowledge for Open-Vocabulary Object Detection
    Li, Liangqi
    Miao, Jiaxu
    Shi, Dahu
    Tan, Wenming
    Ren, Ye
    Yang, Yi
    Pu, Shiliang
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6478 - 6487
  • [43] RGB-D Salient Object Detection Based on Multi-Modal Feature Interaction
    Gao, Yue
    Dai, Meng
    Zhang, Qing
    Computer Engineering and Applications, 2024, 60 (02) : 211 - 220
  • [44] Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection
    Li, Xin
    Shi, Botian
    Hou, Yuenan
    Wu, Xingjiao
    Ma, Tianlong
    Li, Yikang
    He, Liang
    COMPUTER VISION, ECCV 2022, PT XXXVIII, 2022, 13698 : 691 - 707
  • [45] Multi-Modal Weights Sharing and Hierarchical Feature Fusion for RGBD Salient Object Detection
    Xiao, Fen
    Li, Bin
    Peng, Yimu
    Cao, Chunhong
    Hu, Kai
    Gao, Xieping
    IEEE ACCESS, 2020, 8 : 26602 - 26611
  • [46] Multi-modal feature fusion for 3D object detection in the production workshop
    Hou, Rui
    Chen, Guangzhu
    Han, Yinhe
    Tang, Zaizuo
    Ru, Qingjun
    APPLIED SOFT COMPUTING, 2022, 115
  • [47] Deformable Feature Aggregation for Dynamic Multi-modal 3D Object Detection
    Chen, Zehui
    Li, Zhenyu
    Zhang, Shiquan
    Fang, Liangji
    Jiang, Qinhong
    Zhao, Feng
    COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 : 628 - 644
  • [48] Deformable Feature Fusion Network for Multi-Modal 3D Object Detection
    Guo, Kun
    Gan, Tong
    Ding, Zhao
    Ling, Qiang
    2024 3RD INTERNATIONAL CONFERENCE ON ROBOTICS, ARTIFICIAL INTELLIGENCE AND INTELLIGENT CONTROL, RAIIC 2024, 2024, : 363 - 367
  • [49] OvarNet: Towards Open-vocabulary Object Attribute Recognition
    Chen, Keyan
    Jiang, Xiaolong
    Hu, Yao
    Tang, Xu
    Gao, Yan
    Chen, Jianqi
    Xie, Weidi
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23518 - 23527
  • [50] Contrastive Feature Masking Open-Vocabulary Vision Transformer
    Kim, Dahun
    Angelova, Anelia
    Kuo, Weicheng
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15556 - 15566