CrossFormer: Cross-guided attention for multi-modal object detection

被引:10
|
作者
Lee, Seungik [1 ]
Park, Jaehyeong [2 ]
Park, Jinsun [2 ,3 ]
机构
[1] Pusan Natl Univ, Dept Informat Convergence Engn Artificial Intellig, 2 Busandaehak ro 63beon gil, Busan 46241, South Korea
[2] Pusan Natl Univ, Sch Comp Sci & Engn, 2 Busandaehak Ro 63beon Gil, Busan 46241, South Korea
[3] Pusan Natl Univ, Ctr Artificial Intelligence Res, 2 Busandaehak ro 63beon gil, Pusan 46241, South Korea
基金
新加坡国家研究基金会;
关键词
Object detection; Multi-modal; Sensor fusion; TRANSFORMER;
D O I
10.1016/j.patrec.2024.02.012
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Object detection is one of the essential tasks in a variety of real -world applications such as autonomous driving and robotics. In a real -world scenario, unfortunately, there are numerous challenges such as illumination changes, adverse weather conditions, and geographical changes, to name a few. To tackle the problem, we propose a novel multi -modal object detection model that is built upon a hierarchical transformer and cross -guidance between different modalities. The proposed hierarchical transformer consists of domain -specific feature extraction networks where intermediate features are connected by the proposed Cross -Guided Attention Module (CGAM) to enrich their representational power. Specifically, in the CGAM, one domain is regarded as a guide and the other is assigned to a base. After that, the cross -modal attention from the guide to the base is applied to the base feature. The CGAM works bidirectionally in parallel by exchanging roles between modalities to refine multi -modal features simultaneously. Experimental results on FLIR-aligned, LLVIP, and KAIST multispectral pedestrian datasets demonstrate that the proposed method is superior to previous multi -modal detection algorithms quantitatively and qualitatively.
引用
收藏
页码:144 / 150
页数:7
相关论文
共 50 条
  • [31] Multi-Modal Sarcasm Detection with Interactive In-Modal and Cross-Modal Graphs
    Liang, Bin
    Lou, Chenwei
    Li, Xiang
    Gui, Lin
    Yang, Min
    Xu, Ruifeng
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4707 - 4715
  • [32] Text-Guided Object Detector for Multi-modal Video Question Answering
    Shen, Ruoyue
    Inoue, Nakamasa
    Shinoda, Koichi
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 1032 - 1042
  • [33] Imagery in multi-modal object learning
    Jüttner, M
    Rentschler, I
    BEHAVIORAL AND BRAIN SCIENCES, 2002, 25 (02) : 197 - +
  • [34] Enhancing Multi-modal Features Using Local Self-attention for 3D Object Detection
    Li, Hao
    Zhang, Zehan
    Zhao, Xian
    Wang, Yulong
    Shen, Yuxi
    Pu, Shiliang
    Mao, Hui
    COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 532 - 549
  • [35] Cross-modal incongruity aligning and collaborating for multi-modal sarcasm detection
    Wang, Jie
    Yang, Yan
    Jiang, Yongquan
    Ma, Minbo
    Xie, Zhuyang
    Li, Tianrui
    INFORMATION FUSION, 2024, 103
  • [36] CROSS-MODAL KNOWLEDGE DISTILLATION IN MULTI-MODAL FAKE NEWS DETECTION
    Wei, Zimian
    Pan, Hengyue
    Qiao, Linbo
    Niu, Xin
    Dong, Peijie
    Li, Dongsheng
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4733 - 4737
  • [37] Multi-modal object detection and localization for high integrity driving assistance
    Sergio Alberto Rodríguez Flórez
    Vincent Frémont
    Philippe Bonnifait
    Véronique Cherfaoui
    Machine Vision and Applications, 2014, 25 : 583 - 598
  • [38] Multi-Modal Dataset Generation using Domain Randomization for Object Detection
    Marez, Diego
    Nans, Lena
    Borden, Samuel
    GEOSPATIAL INFORMATICS XI, 2021, 11733
  • [39] Multi-modal object detection and localization for high integrity driving assistance
    Florez, Sergio Alberto Rodriguez
    Fremont, Vincent
    Bonnifait, Philippe
    Cherfaoui, Veronique
    MACHINE VISION AND APPLICATIONS, 2014, 25 (03) : 583 - 598
  • [40] MULTI-MODAL FEATURE FUSION NETWORK FOR GHOST IMAGING OBJECT DETECTION
    Hu, Nan
    Ma, Huimin
    Le, Chao
    Shao, Xuehui
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 351 - 355