Multi-modal object detection via transformer network

被引:2
|
作者
Liu, Wenbing [1 ,2 ]
Wang, Haibo [1 ,2 ]
Gao, Quanxue [1 ,3 ]
Zhu, Zhaorui [1 ]
机构
[1] Xidian Univ, Sch Telecommun Engn, Xian, Shaanxi, Peoples R China
[2] Sci & Technol Electroopt Control Lab, Xian, Henan, Peoples R China
[3] Xidian Univ, Sch Telecommun Engn, Xian 710071, Shaanxi, Peoples R China
关键词
image representations; object detection;
D O I
10.1049/ipr2.12884
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
According to the fact that single-modal data usually contain limited information, a great deal of effort has been devoted to making use of the complementary information contained in the multi-modal data on various patterns. Thus, this paper is concerned with an object detection method that can fully utilize multi-modal data. First, the method introduces the transformer mechanism to realize the fusion of intra-modal and inter-modal features of different modal data. The aim is to take advantage of the complementarity of data between modalities, which helps to improve the performance of multi-modal object detection. Second, a contrastive loss suitable for contrastive learning is applied. This enables the authors to effectively utilize label information. Extensive experiments are conducted on multiple object detection datasets to demonstrate the effectiveness of our proposed method.
引用
收藏
页码:3541 / 3550
页数:10
相关论文
共 50 条
  • [41] Multiple object tracking via multi-layer multi-modal framework
    Kang, Hang-Bong
    Chun, Kihong
    IMAGE ANALYSIS, PROCEEDINGS, 2007, 4522 : 789 - +
  • [42] Transformer based Multi-modal Memory-augmented Masked Network for Air Crisis Event Detection
    Yang, Yang
    Zhang, Yishan
    Qian, Shengsheng
    Zhang, Minghua
    Cai, Kaiquan
    2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 3290 - 3297
  • [43] Imagery in multi-modal object learning
    Jüttner, M
    Rentschler, I
    BEHAVIORAL AND BRAIN SCIENCES, 2002, 25 (02) : 197 - +
  • [44] Multi-level Interaction Network for Multi-Modal Rumor Detection
    Zou, Ting
    Qian, Zhong
    Li, Peifeng
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [45] MAFE: Multi-modal Alignment via Mutual Information Maximum Perspective in Multi-modal Fake News Detection
    Qin, Haimei
    Jing, Yaqi
    Duan, Yunqiang
    Jiang, Lei
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 1515 - 1521
  • [46] Multi-modal object detection and localization for high integrity driving assistance
    Sergio Alberto Rodríguez Flórez
    Vincent Frémont
    Philippe Bonnifait
    Véronique Cherfaoui
    Machine Vision and Applications, 2014, 25 : 583 - 598
  • [47] CrossFormer: Cross-guided attention for multi-modal object detection
    Lee, Seungik
    Park, Jaehyeong
    Park, Jinsun
    PATTERN RECOGNITION LETTERS, 2024, 179 : 144 - 150
  • [48] Multi-Modal Dataset Generation using Domain Randomization for Object Detection
    Marez, Diego
    Nans, Lena
    Borden, Samuel
    GEOSPATIAL INFORMATICS XI, 2021, 11733
  • [49] Multi-modal object detection and localization for high integrity driving assistance
    Florez, Sergio Alberto Rodriguez
    Fremont, Vincent
    Bonnifait, Philippe
    Cherfaoui, Veronique
    MACHINE VISION AND APPLICATIONS, 2014, 25 (03) : 583 - 598
  • [50] Leveraging Uncertainties for Deep Multi-modal Object Detection in Autonomous Driving
    Feng, Di
    Cao, Yifan
    Rosenbaum, Lars
    Timm, Fabian
    Dietmayer, Klaus
    2020 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2020, : 871 - 878