CAF-RCNN: multimodal 3D object detection with cross-attention

被引:0
|
作者
Liu, Junting [1 ]
Liu, Deer [1 ,2 ]
Zhu, Lei [1 ]
机构
[1] Jiangxi Univ Sci & Technol, Sch Civil & Surveying & Mapping Engn, Ganzhou, Jiangxi, Peoples R China
[2] Jiangxi Univ Sci & Technol, Sch Civil & Surveying & Mapping Engn, Ganzhou 341400, Jiangxi, Peoples R China
基金
中国国家自然科学基金;
关键词
3D object detection; multimodal fusion; cross-attention mechanism; feature pyramid network;
D O I
10.1080/01431161.2023.2261151
中图分类号
TP7 [遥感技术];
学科分类号
081102 ; 0816 ; 081602 ; 083002 ; 1404 ;
摘要
LiDAR and camera are pivotal sensors of 3D (three-dimensional) object detection. As a result of their different characteristics, increasingly multimodal-based object detection methods have been proposed. Now, popular methods are to hardly associate camera features with LiDAR features, but the features are frequently enhanced and aggregated, so there is a major challenge in how to align two features effectively. Therefore, we propose CAF-RCNN. On the basis of PointRCNN, using Feature Pyramid Network (FPN) to extract advanced semantic features at different scales, then fusing these features with the LiDAR features of the Set Abstraction (SA) module output in PointRCNN and subsequent steps. Regarding the features fusion module, we design a module based on the cross-attention mechanism, CAFM (Cross-Attention Fusion Module). It combines two channel attention streams in a cross-over fashion to utilize rich details about significant objects in the Image Stream and Geometric Stream. We did a lot of experiments on the KITTI dataset, and the result shows that our method is 6.43% higher than PointRCNN in 3D accuracy.
引用
收藏
页码:6131 / 6146
页数:16
相关论文
共 50 条
  • [11] CASNet: A Cross-Attention Siamese Network for Video Salient Object Detection
    Ji, Yuzhu
    Zhang, Haijun
    Jie, Zequn
    Ma, Lin
    Wu, Q. M. Jonathan
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (06) : 2676 - 2690
  • [12] Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers
    Cho, Junhyeong
    Youwang, Kim
    Oh, Tae-Hyun
    COMPUTER VISION - ECCV 2022, PT I, 2022, 13661 : 342 - 359
  • [13] 3D Cascade RCNN: High Quality Object Detection in Point Clouds
    Cai, Qi
    Pan, Yingwei
    Yao, Ting
    Mei, Tao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5706 - 5719
  • [14] 3D multirater RCNN for multimodal multiclass detection and characterisation of extremely small objects
    Sudre, Carole H.
    Anson, Beatriz Gomez
    Ingala, Silvia
    Lane, Chris D.
    Jimenez, Daniel
    Haider, Lukas
    Varsavsky, Thomas
    Smith, Lorna
    Ourselin, Sebastien
    Jager, Rolf H.
    Cardoso, M. Jorge
    INTERNATIONAL CONFERENCE ON MEDICAL IMAGING WITH DEEP LEARNING, VOL 102, 2019, 102 : 447 - 456
  • [15] A joint object detection and semantic segmentation model with cross-attention and inner-attention mechanisms
    Nan, Zhixiong
    Peng, Jizhi
    Jiang, Jingjing
    Chen, Hui
    Yang, Ben
    Xin, Jingmin
    Zheng, Nanning
    NEUROCOMPUTING, 2021, 463 : 212 - 225
  • [16] Multimodal 3D Object Detection from Simulated Pretraining
    Brekke, Asmund
    Vatsendvik, Fredrik
    Lindseth, Frank
    NORDIC ARTIFICIAL INTELLIGENCE RESEARCH AND DEVELOPMENT, 2019, 1056 : 102 - 113
  • [17] Virtual Sparse Convolution for Multimodal 3D Object Detection
    Wu, Hai
    Wen, Chenglu
    Shi, Shaoshuai
    Li, Xin
    Wang, Cheng
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 21653 - 21662
  • [18] Multimodal Transformer for Automatic 3D Annotation and Object Detection
    Liu, Chang
    Qian, Xiaoyan
    Huang, Binxiao
    Qi, Xiaojuan
    Lam, Edmund
    Tan, Siew-Chong
    Wong, Ngai
    COMPUTER VISION, ECCV 2022, PT XXXVIII, 2022, 13698 : 657 - 673
  • [19] ICAFusion: Iterative cross-attention guided feature fusion for multispectral object detection
    Shen, Jifeng
    Chen, Yifei
    Liu, Yue
    Zuo, Xin
    Fan, Heng
    Yang, Wankou
    PATTERN RECOGNITION, 2024, 145
  • [20] Pose-RCNN: Joint Object Detection and Pose Estimation Using 3D Object Proposals
    Braun, Markus
    Rao, Qing
    Wang, Yikang
    Flohr, Fabian
    2016 IEEE 19TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2016, : 1546 - 1551