Multi-modal object detection via transformer network

被引:2
|
作者
Liu, Wenbing [1 ,2 ]
Wang, Haibo [1 ,2 ]
Gao, Quanxue [1 ,3 ]
Zhu, Zhaorui [1 ]
机构
[1] Xidian Univ, Sch Telecommun Engn, Xian, Shaanxi, Peoples R China
[2] Sci & Technol Electroopt Control Lab, Xian, Henan, Peoples R China
[3] Xidian Univ, Sch Telecommun Engn, Xian 710071, Shaanxi, Peoples R China
关键词
image representations; object detection;
D O I
10.1049/ipr2.12884
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
According to the fact that single-modal data usually contain limited information, a great deal of effort has been devoted to making use of the complementary information contained in the multi-modal data on various patterns. Thus, this paper is concerned with an object detection method that can fully utilize multi-modal data. First, the method introduces the transformer mechanism to realize the fusion of intra-modal and inter-modal features of different modal data. The aim is to take advantage of the complementarity of data between modalities, which helps to improve the performance of multi-modal object detection. Second, a contrastive loss suitable for contrastive learning is applied. This enables the authors to effectively utilize label information. Extensive experiments are conducted on multiple object detection datasets to demonstrate the effectiveness of our proposed method.
引用
收藏
页码:3541 / 3550
页数:10
相关论文
共 50 条
  • [1] A Multi-Modal Transformer network for action detection
    Korban, Matthew
    Youngs, Peter
    Acton, Scott T.
    PATTERN RECOGNITION, 2023, 142
  • [2] Class-Agnostic Object Detection with Multi-modal Transformer
    Maaz, Muhammad
    Rasheed, Hanoona
    Khan, Salman
    Khan, Fahad Shahbaz
    Anwer, Rao Muhammad
    Yang, Ming-Hsuan
    COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 512 - 531
  • [3] Positive Unlabeled Fake News Detection via Multi-Modal Masked Transformer Network
    Wang, Jinguang
    Qian, Shengsheng
    Hu, Jun
    Hong, Richang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 234 - 244
  • [4] MULTI-MODAL TRANSFORMER FOR RGB-D SALIENT OBJECT DETECTION
    Song, Peipei
    Zhang, Jing
    Koniusz, Piotr
    Barnes, Nick
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2466 - 2470
  • [5] Multi-Modal Feature Pyramid Transformer for RGB-Infrared Object Detection
    Zhu, Yaohui
    Sun, Xiaoyu
    Wang, Miao
    Huang, Hua
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (09) : 9984 - 9995
  • [6] Multi-Modal Adversarial Example Detection with Transformer
    Ding, Chaoyue
    Sun, Shiliang
    Zhao, Jing
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [7] Multi-modal transformer for fake news detection
    Yang, Pingping
    Ma, Jiachen
    Liu, Yong
    Liu, Meng
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (08) : 14699 - 14717
  • [8] MULTI-MODAL FEATURE FUSION NETWORK FOR GHOST IMAGING OBJECT DETECTION
    Hu, Nan
    Ma, Huimin
    Le, Chao
    Shao, Xuehui
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 351 - 355
  • [9] RGB-T-UV Multi-modal Object Tracking Based on Transformer Network
    Song, Qinghua
    Wang, Xiaolei
    Zhang, Yi
    Hu, Jinping
    Liu, Yu
    ADVANCES IN VISUAL COMPUTING, ISVC 2024, PT I, 2025, 15046 : 237 - 248
  • [10] Multi-modal Queried Object Detection in the Wild
    Xu, Yifan
    Zhang, Mengdan
    Fu, Chaoyou
    Chen, Peixian
    Yang, Xiaoshan
    Li, Ke
    Xu, Changsheng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,