SACINet: Semantic-Aware Cross-Modal Interaction Network for Real-Time 3D Object Detection

被引:1
|
作者
Yang, Ying [1 ]
Yin, Hui [1 ]
Chong, Ai-Xin [1 ]
Wan, Jin [2 ]
Liu, Qing-Yi [1 ]
机构
[1] Beijing Jiaotong Univ, Beijing Key Lab Traff Data Anal & Min, Beijing 100044, Peoples R China
[2] Beijing Jiaotong Univ, Key Lab Beijing Railway Engn, Beijing 100044, Peoples R China
来源
关键词
Semantics; Feature extraction; Three-dimensional displays; Real-time systems; Point cloud compression; Task analysis; Object detection; Autonomous driving; Real-time 3D object detection; semantic occupancy perception; Cross-modal fusion;
D O I
10.1109/TIV.2023.3348099
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
LiDAR-Camera fusion-based 3D object detection is one of the main visual perception tasks in autonomous driving, facing the challenges of small targets and occlusions. Image semantics are beneficial for these issues, yet most existing methods applied semantics only in the cross-modal fusion stage to compensate for point geometric features, where the advantages of semantic information are not effectively explored. Further, the increased complexity of the network caused by introducing semantics is also a major obstacle to real-time. In this article, we propose a Semantic-Aware Cross-modal Interaction Network(SACINet) to achieve real-time 3D object detection, which introduces high-level semantics into both key stages of image feature extraction and cross-modal fusion. Specifically, we design a Lightweight Semantic-aware Image Feature Extractor(LSIFE) to enhance semantic samplings of objects while reducing numerous parameters. Additionally, a Semantic-Modulated Cross-modal Interaction Mechanism(SMCIM) is proposed to stress semantic details in cross-modal fusion. This mechanism conducts a pairwise interactive fusion among geometric features, semantic-aware point-wise image features, and semantic-aware point-wise segmentation features by the designed Conditions Generation Network(CGN) and Semantic-Aware Point-wise Feature Modulation(SAPFM). Ultimately, we construct a real-time(25.2fps) 3D detector with minor parameters(23.79 MB), which can better achieve the trade-off between accuracy and efficiency. Comprehensive experiments on the KITTI benchmark illustrate that SACINet is effective for real-time 3D detection, especially on small and severely occluded targets. Further, we conduct semantic occupancy perception experiments on the latest nuScenes-Occupancy benchmark, which verifies the effectiveness of SMCIM.
引用
收藏
页码:3917 / 3927
页数:11
相关论文
共 50 条
  • [41] STXD: Structural and Temporal Cross-Modal Distillation for Multi-View 3D Object Detection
    Jang, Sujin
    Jo, Dae Ung
    Hwang, Sung Ju
    Lee, Dongwook
    Ji, Daehyun
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [42] DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation
    Wang, Zeyu
    Li, Dingwen
    Luo, Chenxu
    Xie, Cihang
    Yang, Xiaodong
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 8603 - 8612
  • [43] AFMCT: adaptive fusion module based on cross-modal transformer block for 3D object detection
    Zhang, Bingli
    Wang, Yixin
    Zhang, Chengbiao
    Jiang, Junzhao
    Pan, Zehao
    Cheng, Jin
    Zhang, Yangyang
    Wang, Xinyu
    Yang, Chenglei
    Wang, Yanhui
    MACHINE VISION AND APPLICATIONS, 2024, 35 (03)
  • [44] CMDA: Cross-Modal and Domain Adversarial Adaptation for LiDAR-Based 3D Object Detection
    Chang, Gyusam
    Roh, Wonseok
    Jang, Sujin
    Lee, Dongwook
    Ji, Daehyun
    Oh, Gyeongrok
    Park, Jinsun
    Kim, Jinkyu
    Kim, Sangpil
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 972 - 980
  • [45] LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion
    Li, Xin
    Ma, Tao
    Hou, Yuenan
    Shi, Botian
    Yang, Yuchen
    Liu, Youquan
    Wu, Xingjiao
    Chen, Qin
    Li, Yikang
    Qiao, Yu
    He, Liang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 17524 - 17534
  • [46] MLF-DET: Multi-Level Fusion for Cross-Modal 3D Object Detection
    Lin, Zewei
    Shen, Yanqing
    Zhou, Sanping
    Chen, Shitao
    Zheng, Nanning
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VII, 2023, 14260 : 136 - 149
  • [47] Cross-Modal Collaboration and Robust Feature Classifier for Open-Vocabulary 3D Object Detection
    Liu, Hengsong
    Duan, Tongle
    SENSORS, 2025, 25 (02)
  • [48] Semantic-aware self-supervised depth estimation for stereo 3D detection
    Sun, Hanqing
    Cao, Jiale
    Pang, Yanwei
    PATTERN RECOGNITION LETTERS, 2023, 167 : 164 - 170
  • [49] RODNet: A Real-Time Radar Object Detection Network Cross-Supervised by Camera-Radar Fused Object 3D Localization
    Wang, Yizhou
    Jiang, Zhongyu
    Li, Yudong
    Hwang, Jenq-Neng
    Xing, Guanbin
    Liu, Hui
    Wang, Yizhou (ywang26@uw.edu), 1600, Institute of Electrical and Electronics Engineers Inc. (15): : 954 - 967
  • [50] RODNet: A Real-Time Radar Object Detection Network Cross-Supervised by Camera-Radar Fused Object 3D Localization
    Wang, Yizhou
    Jiang, Zhongyu
    Li, Yudong
    Hwang, Jenq-Neng
    Xing, Guanbin
    Liu, Hui
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2021, 15 (04) : 954 - 967