Multi-modal object detection via transformer network

被引:2
|
作者
Liu, Wenbing [1 ,2 ]
Wang, Haibo [1 ,2 ]
Gao, Quanxue [1 ,3 ]
Zhu, Zhaorui [1 ]
机构
[1] Xidian Univ, Sch Telecommun Engn, Xian, Shaanxi, Peoples R China
[2] Sci & Technol Electroopt Control Lab, Xian, Henan, Peoples R China
[3] Xidian Univ, Sch Telecommun Engn, Xian 710071, Shaanxi, Peoples R China
关键词
image representations; object detection;
D O I
10.1049/ipr2.12884
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
According to the fact that single-modal data usually contain limited information, a great deal of effort has been devoted to making use of the complementary information contained in the multi-modal data on various patterns. Thus, this paper is concerned with an object detection method that can fully utilize multi-modal data. First, the method introduces the transformer mechanism to realize the fusion of intra-modal and inter-modal features of different modal data. The aim is to take advantage of the complementarity of data between modalities, which helps to improve the performance of multi-modal object detection. Second, a contrastive loss suitable for contrastive learning is applied. This enables the authors to effectively utilize label information. Extensive experiments are conducted on multiple object detection datasets to demonstrate the effectiveness of our proposed method.
引用
收藏
页码:3541 / 3550
页数:10
相关论文
共 50 条
  • [21] Deformable Feature Fusion Network for Multi-Modal 3D Object Detection
    Guo, Kun
    Gan, Tong
    Ding, Zhao
    Ling, Qiang
    2024 3RD INTERNATIONAL CONFERENCE ON ROBOTICS, ARTIFICIAL INTELLIGENCE AND INTELLIGENT CONTROL, RAIIC 2024, 2024, : 363 - 367
  • [22] TMD-FS: Improving Few-Shot Object Detection with Transformer Multi-modal Directing
    Yuan, Ying
    Duan, Lijuan
    Wang, Wenjian
    En, Qing
    PATTERN RECOGNITION AND COMPUTER VISION, PT IV, 2021, 13022 : 447 - 458
  • [23] CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection
    Zhang, Yanan
    Chen, Jiaxin
    Huang, Di
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 898 - 907
  • [24] UniTR: A Unified TRansformer-Based Framework for Co-Object and Multi-Modal Saliency Detection
    Guo, Ruohao
    Ying, Xianghua
    Qi, Yanyu
    Qu, Liao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7622 - 7635
  • [25] Multi-modal of object trajectories
    Partsinevelos, P.
    JOURNAL OF SPATIAL SCIENCE, 2008, 53 (01) : 17 - 30
  • [26] Multi-Modal Knowledge Graph Transformer Framework for Multi-Modal Entity Alignment
    Li, Qian
    Ji, Cheng
    Guo, Shu
    Liang, Zhaoji
    Wang, Lihong
    Li, Jianxin
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 987 - 999
  • [27] Three-Dimensional Object Detection Network Based on Multi-Layer and Multi-Modal Fusion
    Zhu, Wenming
    Zhou, Jia
    Wang, Zizhe
    Zhou, Xuehua
    Zhou, Feng
    Sun, Jingwen
    Song, Mingrui
    Zhou, Zhiguo
    ELECTRONICS, 2024, 13 (17)
  • [28] Hierarchical Multi-Modal Prompting Transformer for Multi-Modal Long Document Classification
    Liu, Tengfei
    Hu, Yongli
    Gao, Junbin
    Sun, Yanfeng
    Yin, Baocai
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 6376 - 6390
  • [29] On Addressing Network Synchronization in Object Tracking with Multi-modal Sensors
    Jung, Sangkil
    Lee, Jinseok
    Hong, Sangjin
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2009, 3 (04): : 344 - 365
  • [30] Object detection in multi-modal images using genetic programming
    Bhanu, B
    Lin, YQ
    APPLIED SOFT COMPUTING, 2004, 4 (02) : 175 - 201