Multi-modal object detection via transformer network

被引:2
|
作者
Liu, Wenbing [1 ,2 ]
Wang, Haibo [1 ,2 ]
Gao, Quanxue [1 ,3 ]
Zhu, Zhaorui [1 ]
机构
[1] Xidian Univ, Sch Telecommun Engn, Xian, Shaanxi, Peoples R China
[2] Sci & Technol Electroopt Control Lab, Xian, Henan, Peoples R China
[3] Xidian Univ, Sch Telecommun Engn, Xian 710071, Shaanxi, Peoples R China
关键词
image representations; object detection;
D O I
10.1049/ipr2.12884
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
According to the fact that single-modal data usually contain limited information, a great deal of effort has been devoted to making use of the complementary information contained in the multi-modal data on various patterns. Thus, this paper is concerned with an object detection method that can fully utilize multi-modal data. First, the method introduces the transformer mechanism to realize the fusion of intra-modal and inter-modal features of different modal data. The aim is to take advantage of the complementarity of data between modalities, which helps to improve the performance of multi-modal object detection. Second, a contrastive loss suitable for contrastive learning is applied. This enables the authors to effectively utilize label information. Extensive experiments are conducted on multiple object detection datasets to demonstrate the effectiveness of our proposed method.
引用
收藏
页码:3541 / 3550
页数:10
相关论文
共 50 条
  • [31] Human head detection using multi-modal object features
    Luo, Y
    Murphey, YL
    Khairallah, F
    PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS 2003, VOLS 1-4, 2003, : 2134 - 2139
  • [32] Multi-Modal Streaming 3D Object Detection
    Abdelfattah, Mazen
    Yuan, Kaiwen
    Wang, Z. Jane
    Ward, Rabab
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (10) : 6163 - 6170
  • [33] BMFNet: Bifurcated multi-modal fusion network for RGB-D salient object detection
    Sun, Chenwang
    Zhang, Qing
    Zhuang, Chenyu
    Zhang, Mingqian
    IMAGE AND VISION COMPUTING, 2024, 147
  • [34] MEANet: Multi-modal edge-aware network for light field salient object detection
    Jiang, Yao
    Zhang, Wenbo
    Fu, Keren
    Zhao, Qijun
    NEUROCOMPUTING, 2022, 491 : 78 - 90
  • [35] Open-World Human-Object Interaction Detection via Multi-modal Prompts
    Yang, Jie
    Li, Bingliang
    Zeng, Ailing
    Zhang, Lei
    Zhang, Ruimao
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 16954 - 16964
  • [36] Multi-Modal Adaptive Fusion Transformer Network for the Estimation of Depression Level
    Sun, Hao
    Liu, Jiaqing
    Chai, Shurong
    Qiu, Zhaolin
    Lin, Lanfen
    Huang, Xinyin
    Chen, Yenwei
    SENSORS, 2021, 21 (14)
  • [37] RoboFusion: Towards Robust Multi-Modal 3D Object Detection via SAM
    Song, Ziying
    Zhang, Guoxing
    Liu, Lin
    Yang, Lei
    Xu, Shaoqing
    Jia, Caiyan
    Jia, Feiyang
    Wang, Li
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 1272 - 1280
  • [38] Multi-modal Neural Network for Traffic Event Detection
    Chen, Qi
    Wang, Wei
    2019 IEEE 2ND INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION ENGINEERING (ICECE 2019), 2019, : 26 - 30
  • [39] A multi-modal fusion YoLo network for traffic detection
    Zheng, Xinwang
    Zheng, Wenjie
    Xu, Chujie
    COMPUTATIONAL INTELLIGENCE, 2024, 40 (02)
  • [40] Multi-modal long document classification based on Hierarchical Prompt and Multi-modal Transformer
    Liu, Tengfei
    Hu, Yongli
    Gao, Junbin
    Wang, Jiapu
    Sun, Yanfeng
    Yin, Baocai
    NEURAL NETWORKS, 2024, 176