SACINet: Semantic-Aware Cross-Modal Interaction Network for Real-Time 3D Object Detection

被引:1
|
作者
Yang, Ying [1 ]
Yin, Hui [1 ]
Chong, Ai-Xin [1 ]
Wan, Jin [2 ]
Liu, Qing-Yi [1 ]
机构
[1] Beijing Jiaotong Univ, Beijing Key Lab Traff Data Anal & Min, Beijing 100044, Peoples R China
[2] Beijing Jiaotong Univ, Key Lab Beijing Railway Engn, Beijing 100044, Peoples R China
来源
关键词
Semantics; Feature extraction; Three-dimensional displays; Real-time systems; Point cloud compression; Task analysis; Object detection; Autonomous driving; Real-time 3D object detection; semantic occupancy perception; Cross-modal fusion;
D O I
10.1109/TIV.2023.3348099
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
LiDAR-Camera fusion-based 3D object detection is one of the main visual perception tasks in autonomous driving, facing the challenges of small targets and occlusions. Image semantics are beneficial for these issues, yet most existing methods applied semantics only in the cross-modal fusion stage to compensate for point geometric features, where the advantages of semantic information are not effectively explored. Further, the increased complexity of the network caused by introducing semantics is also a major obstacle to real-time. In this article, we propose a Semantic-Aware Cross-modal Interaction Network(SACINet) to achieve real-time 3D object detection, which introduces high-level semantics into both key stages of image feature extraction and cross-modal fusion. Specifically, we design a Lightweight Semantic-aware Image Feature Extractor(LSIFE) to enhance semantic samplings of objects while reducing numerous parameters. Additionally, a Semantic-Modulated Cross-modal Interaction Mechanism(SMCIM) is proposed to stress semantic details in cross-modal fusion. This mechanism conducts a pairwise interactive fusion among geometric features, semantic-aware point-wise image features, and semantic-aware point-wise segmentation features by the designed Conditions Generation Network(CGN) and Semantic-Aware Point-wise Feature Modulation(SAPFM). Ultimately, we construct a real-time(25.2fps) 3D detector with minor parameters(23.79 MB), which can better achieve the trade-off between accuracy and efficiency. Comprehensive experiments on the KITTI benchmark illustrate that SACINet is effective for real-time 3D detection, especially on small and severely occluded targets. Further, we conduct semantic occupancy perception experiments on the latest nuScenes-Occupancy benchmark, which verifies the effectiveness of SMCIM.
引用
收藏
页码:3917 / 3927
页数:11
相关论文
共 50 条
  • [21] An improved dense-to-sparse cross-modal fusion network for 3D object detection in RGB-D images
    Yan Chen
    Jianjun Ni
    Guangyi Tang
    Weidong Cao
    Simon X. Yang
    Multimedia Tools and Applications, 2024, 83 : 12159 - 12184
  • [22] Cross-Modal Center Loss for 3D Cross-Modal Retrieval
    Jing, Longlong
    Vahdani, Elahe
    Tan, Jiaxing
    Tian, Yingli
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3141 - 3150
  • [23] Real-Time Multimodal 3D Object Detection with Transformers
    Liu, Hengsong
    Duan, Tongle
    WORLD ELECTRIC VEHICLE JOURNAL, 2024, 15 (07):
  • [24] Real-Time 3D Object Detection on Crowded Pedestrians
    Lu, Bin
    Li, Qing
    Liang, Yanju
    SENSORS, 2023, 23 (21)
  • [25] Real-time 3D Object Detection in Unstructured Environments
    Rui, Wang
    Ying, Liang
    PROCEEDINGS FIRST INTERNATIONAL CONFERENCE ON ELECTRONICS INSTRUMENTATION & INFORMATION SYSTEMS (EIIS 2017), 2017, : 183 - 188
  • [26] Cross-modal Learning for Domain Adaptation in 3D Semantic Segmentation
    Jaritz, Maximilian
    Vu, Tuan-Hung
    de Charette, Raoul
    Wirbel, Émilie
    Pérez, Patrick
    arXiv, 2021,
  • [27] Cross-Modal Learning for Domain Adaptation in 3D Semantic Segmentation
    Jaritz, Maximilian
    Tuan-Hung Vu
    de Charette, Raoul
    Wirbel, Emilie
    Perez, Patrick
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (02) : 1533 - 1544
  • [28] BCAF-3D: Bilateral Content Awareness Fusion for cross-modal 3D object detection
    Chen, Mu
    Liu, Pengfei
    Zhao, Huaici
    KNOWLEDGE-BASED SYSTEMS, 2023, 279
  • [29] Attention-aware Cross-modal Cross-level Fusion Network for RGB-D Salient Object Detection
    Chen, Hao
    Li, You-Fu
    Su, Dan
    2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 6821 - 6826
  • [30] Cross-Modal Match for Language Conditioned 3D Object Grounding
    Zhang, Yachao
    Hu, Runze
    Li, Ronghui
    Qu, Yanyun
    Xie, Yuan
    Li, Xiu
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7359 - 7367