SACINet: Semantic-Aware Cross-Modal Interaction Network for Real-Time 3D Object Detection

被引:1
|
作者
Yang, Ying [1 ]
Yin, Hui [1 ]
Chong, Ai-Xin [1 ]
Wan, Jin [2 ]
Liu, Qing-Yi [1 ]
机构
[1] Beijing Jiaotong Univ, Beijing Key Lab Traff Data Anal & Min, Beijing 100044, Peoples R China
[2] Beijing Jiaotong Univ, Key Lab Beijing Railway Engn, Beijing 100044, Peoples R China
来源
关键词
Semantics; Feature extraction; Three-dimensional displays; Real-time systems; Point cloud compression; Task analysis; Object detection; Autonomous driving; Real-time 3D object detection; semantic occupancy perception; Cross-modal fusion;
D O I
10.1109/TIV.2023.3348099
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
LiDAR-Camera fusion-based 3D object detection is one of the main visual perception tasks in autonomous driving, facing the challenges of small targets and occlusions. Image semantics are beneficial for these issues, yet most existing methods applied semantics only in the cross-modal fusion stage to compensate for point geometric features, where the advantages of semantic information are not effectively explored. Further, the increased complexity of the network caused by introducing semantics is also a major obstacle to real-time. In this article, we propose a Semantic-Aware Cross-modal Interaction Network(SACINet) to achieve real-time 3D object detection, which introduces high-level semantics into both key stages of image feature extraction and cross-modal fusion. Specifically, we design a Lightweight Semantic-aware Image Feature Extractor(LSIFE) to enhance semantic samplings of objects while reducing numerous parameters. Additionally, a Semantic-Modulated Cross-modal Interaction Mechanism(SMCIM) is proposed to stress semantic details in cross-modal fusion. This mechanism conducts a pairwise interactive fusion among geometric features, semantic-aware point-wise image features, and semantic-aware point-wise segmentation features by the designed Conditions Generation Network(CGN) and Semantic-Aware Point-wise Feature Modulation(SAPFM). Ultimately, we construct a real-time(25.2fps) 3D detector with minor parameters(23.79 MB), which can better achieve the trade-off between accuracy and efficiency. Comprehensive experiments on the KITTI benchmark illustrate that SACINet is effective for real-time 3D detection, especially on small and severely occluded targets. Further, we conduct semantic occupancy perception experiments on the latest nuScenes-Occupancy benchmark, which verifies the effectiveness of SMCIM.
引用
收藏
页码:3917 / 3927
页数:11
相关论文
共 50 条
  • [1] PointAugmenting: Cross-Modal Augmentation for 3D Object Detection
    Wang, Chunwei
    Ma, Chao
    Zhu, Ming
    Yang, Xiaokang
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11789 - 11798
  • [2] Semantic-Aware Real-Time Scheduling in Robotics
    Mastrogiovanni, Fulvio
    Paikan, Ali
    Sgorbissa, Antonio
    IEEE TRANSACTIONS ON ROBOTICS, 2013, 29 (01) : 118 - 135
  • [3] Cross-modal hierarchical interaction network for RGB-D salient object detection
    Bi, Hongbo
    Wu, Ranwan
    Liu, Ziqi
    Zhu, Huihui
    Zhang, Cong
    Xiang, Tian -Zhu
    PATTERN RECOGNITION, 2023, 136
  • [4] HSNet: An Intelligent Hierarchical Semantic-Aware Network System for Real-Time Semantic Segmentation
    Peng, Xin
    Cheng, Jieren
    Tang, Xiangyan
    Deng, Ziqi
    Tu, Wenxuan
    Xiong, Neal
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2024, 54 (07): : 4318 - 4330
  • [5] Weakly-Supervised Enhanced Semantic-Aware Hashing for Cross-Modal Retrieval
    Zhang, Chao
    Li, Huaxiong
    Gao, Yang
    Chen, Chunlin
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (06) : 6475 - 6488
  • [6] 3D Object Detection Method with Image Semantic Feature Guidance and Cross-Modal Fusion of Point Cloud
    Li, Hui
    Wang, Junyin
    Cheng, Yuanzhi
    Liu, Jian
    Zhao, Guowei
    Chen, Shuangmin
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2024, 36 (05): : 734 - 749
  • [7] Unleash the Potential of Image Branch for Cross-modal 3D Object Detection
    Zhang, Yifan
    Zhang, Qijian
    Hou, Junhui
    Yuan, Yixuan
    Xing, Guoliang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [8] Cross-Modal 3D Object Detection and Tracking for Auto-Driving
    Zeng, Yihan
    Ma, Chao
    Zhu, Ming
    Fan, Zhiming
    Yang, Xiaokang
    2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 3850 - 3857
  • [9] AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection
    Liu, Zongdai
    Zhou, Dingfu
    Lu, Feixiang
    Fang, Jin
    Zhang, Liangjun
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 15621 - 15630
  • [10] Hardware-Aware Latency Pruning for Real-Time 3D Object Detection
    Shen, Maying
    Mao, Lei
    Chen, Joshua
    Hsu, Justin
    Sun, Xinglong
    Knieps, Oliver
    Maxim, Carmen
    Alvarez, Jose M.
    2023 IEEE INTELLIGENT VEHICLES SYMPOSIUM, IV, 2023,