Towards efficient multi-modal 3D object detection: Homogeneous sparse fuse network

被引:0
|
作者
Tang, Yingjuan [1 ]
He, Hongwen [1 ]
Wang, Yong [1 ]
Wu, Jingda [2 ]
机构
[1] Beijing Inst Technol, Sch Mech Engn, Beijing 100081, Peoples R China
[2] Nanyang Technol Univ, Sch Mech & Aerosp Engn, 50 Nanyang Ave, Singapore 639798, Singapore
关键词
Autonomous driving; 3D object detection; Multi-modal; Sparse convolutional networks; Point cloud and image fusion; Homogeneous fusion;
D O I
10.1016/j.eswa.2024.124945
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
LiDAR-only 3D detection methods struggle with the sparsity of point clouds. To overcome this issue, multi- modal methods have been proposed, but their fusion is a challenge due to the heterogeneous representation of images and point clouds. This paper proposes a novel multi-modal framework, Homogeneous Sparse Fusion (HS-Fusion), which generates pseudo point clouds from depth completion. The proposed framework introduces a 3D foreground-aware middle extractor that efficiently extracts high-responding foreground features from sparse point cloud data. This module can be integrated into existing sparse convolutional neural networks. Furthermore, the proposed homogeneous attentive fusion enables cross-modality consistency fusion. Finally, the proposed HS-Fusion can simultaneously combine 2D image features and 3D geometric features of pseudo point clouds using multi-representation feature extraction. The proposed network has been found to attain better performance on the 3D object detection benchmarks. In particular, the proposed model demonstrates a 4.02% improvement in accuracy compared to the pure model. Moreover, its inference speed surpasses that of other models, thus further validating the efficacy of HS-Fusion.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] SimDistill: Simulated Multi-Modal Distillation for BEV 3D Object Detection
    Zhao, Haimei
    Zhang, Qiming
    Zhao, Shanshan
    Chen, Zhe
    Zhang, Jing
    Tao, Dacheng
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7460 - 7468
  • [22] MLF3D: Multi-Level Fusion for Multi-Modal 3D Object Detection
    Jiang, Han
    Wang, Jianbin
    Xiao, Jianru
    Zhao, Yanan
    Chen, Wanqing
    Ren, Yilong
    Yu, Haiyang
    2024 35TH IEEE INTELLIGENT VEHICLES SYMPOSIUM, IEEE IV 2024, 2024, : 1588 - 1593
  • [23] ActiveAnno3D-An Active Learning Framework for Multi-Modal 3D Object Detection
    Ghita, Ahmed
    Antoniussen, Bjork
    Zimmer, Walter
    Greer, Ross
    Cress, Christian
    Mogelmose, Andreas
    Trivedi, Mohan M.
    Knoll, Alois C.
    2024 35TH IEEE INTELLIGENT VEHICLES SYMPOSIUM, IEEE IV 2024, 2024, : 1699 - 1706
  • [24] Artifacts Mapping: Multi-Modal Semantic Mapping for Object Detection and 3D Localization
    Rollo, Federico
    Raiola, Gennaro
    Zunino, Andrea
    Tsagarakis, Nikolaos
    Ajoudani, Arash
    2023 EUROPEAN CONFERENCE ON MOBILE ROBOTS, ECMR, 2023, : 90 - 97
  • [25] Multi-Modal Fusion Based on Depth Adaptive Mechanism for 3D Object Detection
    Liu, Zhanwen
    Cheng, Juanru
    Fan, Jin
    Lin, Shan
    Wang, Yang
    Zhao, Xiangmo
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 707 - 717
  • [26] Height-Adaptive Deformable Multi-Modal Fusion for 3D Object Detection
    Li, Jiahao
    Chen, Lingshan
    Li, Zhen
    IEEE ACCESS, 2025, 13 : 52385 - 52396
  • [27] Frustum FusionNet: Amodal 3D Object Detection with Multi-Modal Feature Fusion
    Zuo, Liangyu
    Li, Yaochen
    Han, Mengtao
    Li, Qiao
    Liu, Yuehu
    2021 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2021, : 2746 - 2751
  • [28] Enhancing 3D object detection through multi-modal fusion for cooperative perception
    Xia, Bin
    Zhou, Jun
    Kong, Fanyu
    You, Yuhe
    Yang, Jiarui
    Lin, Lin
    ALEXANDRIA ENGINEERING JOURNAL, 2024, 104 : 46 - 55
  • [29] Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection
    Huang, Linyan
    Li, Zhiqi
    Sima, Chonghao
    Wang, Wenhai
    Wang, Jingdong
    Qiao, Yu
    Li, Hongyang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [30] Multi-modal object detection via transformer network
    Liu, Wenbing
    Wang, Haibo
    Gao, Quanxue
    Zhu, Zhaorui
    IET IMAGE PROCESSING, 2023, 17 (12) : 3541 - 3550