CAF-RCNN: multimodal 3D object detection with cross-attention

被引:0
|
作者
Liu, Junting [1 ]
Liu, Deer [1 ,2 ]
Zhu, Lei [1 ]
机构
[1] Jiangxi Univ Sci & Technol, Sch Civil & Surveying & Mapping Engn, Ganzhou, Jiangxi, Peoples R China
[2] Jiangxi Univ Sci & Technol, Sch Civil & Surveying & Mapping Engn, Ganzhou 341400, Jiangxi, Peoples R China
基金
中国国家自然科学基金;
关键词
3D object detection; multimodal fusion; cross-attention mechanism; feature pyramid network;
D O I
10.1080/01431161.2023.2261151
中图分类号
TP7 [遥感技术];
学科分类号
081102 ; 0816 ; 081602 ; 083002 ; 1404 ;
摘要
LiDAR and camera are pivotal sensors of 3D (three-dimensional) object detection. As a result of their different characteristics, increasingly multimodal-based object detection methods have been proposed. Now, popular methods are to hardly associate camera features with LiDAR features, but the features are frequently enhanced and aggregated, so there is a major challenge in how to align two features effectively. Therefore, we propose CAF-RCNN. On the basis of PointRCNN, using Feature Pyramid Network (FPN) to extract advanced semantic features at different scales, then fusing these features with the LiDAR features of the Set Abstraction (SA) module output in PointRCNN and subsequent steps. Regarding the features fusion module, we design a module based on the cross-attention mechanism, CAFM (Cross-Attention Fusion Module). It combines two channel attention streams in a cross-over fashion to utilize rich details about significant objects in the Image Stream and Geometric Stream. We did a lot of experiments on the KITTI dataset, and the result shows that our method is 6.43% higher than PointRCNN in 3D accuracy.
引用
收藏
页码:6131 / 6146
页数:16
相关论文
共 50 条
  • [1] DyFusion: Cross-Attention 3D Object Detection with Dynamic Fusion
    Bi, Jiangfeng
    Wei, Haiyue
    Zhang, Guoxin
    Yang, Kuihe
    Song, Ziying
    IEEE LATIN AMERICA TRANSACTIONS, 2024, 22 (02) : 106 - 112
  • [2] Anti-Noise 3D Object Detection of Multimodal Feature Attention Fusion Based on PV-RCNN
    Zhu, Yuan
    Xu, Ruidong
    An, Hao
    Tao, Chongben
    Lu, Ke
    SENSORS, 2023, 23 (01)
  • [3] FusionPainting: Multimodal Fusion with Adaptive Attention for 3D Object Detection
    Xu, Shaoqing
    Zhou, Dingfu
    Fang, Jin
    Yin, Junbo
    Bin, Zhou
    Zhang, Liangjun
    2021 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2021, : 3047 - 3054
  • [4] Multimodal Cross-Attention Graph Network for Desire Detection
    Gu, Ruitong
    Wang, Xin
    Yang, Qinghong
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT IV, 2023, 14257 : 512 - 523
  • [5] CramNet: Camera-Radar Fusion with Ray-Constrained Cross-Attention for Robust 3D Object Detection
    Hwang, Jyh-Jing
    Kretzschmar, Henrik
    Manela, Joshua
    Rafferty, Sean
    Armstrong-Crews, Nicholas
    Chen, Tiffany
    Anguelov, Dragomir
    COMPUTER VISION, ECCV 2022, PT XXXVIII, 2022, 13698 : 388 - 405
  • [6] Spatial Cross-Attention RGB-D Fusion Module for Object Detection
    Gao, Shangyin
    Markhasin, Lev
    Wang, Bi
    IEEE MMSP 2021: 2021 IEEE 23RD INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2021,
  • [7] An Enhanced Cross-Attention Based Multimodal Model for Depression Detection
    Kou, Yifan
    Ge, Fangzhen
    Chen, Debao
    Shen, Longfeng
    Liu, Huaiyu
    Computational Intelligence, 2025, 41 (01)
  • [8] AFE-RCNN: Adaptive Feature Enhancement RCNN for 3D Object Detection
    Shuang, Feng
    Huang, Hanzhang
    Li, Yong
    Qu, Rui
    Li, Pei
    REMOTE SENSING, 2022, 14 (05)
  • [9] GA-RCNN:Graph self-attention feature extraction for 3D object detection
    Yi, Yangyang
    Yu, Long
    Tian, Shengwei
    Gao, Xuezhuang
    Li, Jie
    Zhao, Xingang
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2024, 46 (02) : 5175 - 5189
  • [10] Multimodal 3D Histogram for Moving Object Detection
    Mukherjee, Dibyendu
    Saha, Ashirbani
    Wu, Q. M. Jonathan
    Jiang, Wei
    2014 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2014, : 2397 - 2402