SACINet: Semantic-Aware Cross-Modal Interaction Network for Real-Time 3D Object Detection

被引:1
|
作者
Yang, Ying [1 ]
Yin, Hui [1 ]
Chong, Ai-Xin [1 ]
Wan, Jin [2 ]
Liu, Qing-Yi [1 ]
机构
[1] Beijing Jiaotong Univ, Beijing Key Lab Traff Data Anal & Min, Beijing 100044, Peoples R China
[2] Beijing Jiaotong Univ, Key Lab Beijing Railway Engn, Beijing 100044, Peoples R China
来源
关键词
Semantics; Feature extraction; Three-dimensional displays; Real-time systems; Point cloud compression; Task analysis; Object detection; Autonomous driving; Real-time 3D object detection; semantic occupancy perception; Cross-modal fusion;
D O I
10.1109/TIV.2023.3348099
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
LiDAR-Camera fusion-based 3D object detection is one of the main visual perception tasks in autonomous driving, facing the challenges of small targets and occlusions. Image semantics are beneficial for these issues, yet most existing methods applied semantics only in the cross-modal fusion stage to compensate for point geometric features, where the advantages of semantic information are not effectively explored. Further, the increased complexity of the network caused by introducing semantics is also a major obstacle to real-time. In this article, we propose a Semantic-Aware Cross-modal Interaction Network(SACINet) to achieve real-time 3D object detection, which introduces high-level semantics into both key stages of image feature extraction and cross-modal fusion. Specifically, we design a Lightweight Semantic-aware Image Feature Extractor(LSIFE) to enhance semantic samplings of objects while reducing numerous parameters. Additionally, a Semantic-Modulated Cross-modal Interaction Mechanism(SMCIM) is proposed to stress semantic details in cross-modal fusion. This mechanism conducts a pairwise interactive fusion among geometric features, semantic-aware point-wise image features, and semantic-aware point-wise segmentation features by the designed Conditions Generation Network(CGN) and Semantic-Aware Point-wise Feature Modulation(SAPFM). Ultimately, we construct a real-time(25.2fps) 3D detector with minor parameters(23.79 MB), which can better achieve the trade-off between accuracy and efficiency. Comprehensive experiments on the KITTI benchmark illustrate that SACINet is effective for real-time 3D detection, especially on small and severely occluded targets. Further, we conduct semantic occupancy perception experiments on the latest nuScenes-Occupancy benchmark, which verifies the effectiveness of SMCIM.
引用
收藏
页码:3917 / 3927
页数:11
相关论文
共 50 条
  • [31] Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds
    Simon, Martin
    Amende, Karl
    Kraus, Andrea
    Honer, Jens
    Saemann, Timo
    Kaulbersch, Hauke
    Milz, Stefan
    Gross, Horst Michael
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 1190 - 1199
  • [32] SEMANTIC-AWARE ALIGNMENT NETWORK FOR CROSS-RESOLUTION CHANGE DETECTION
    Zhang, Yijun
    Xiong, Fengchao
    Lu, Jianfeng
    Ye, Minchao
    Zhou, Jun
    Qian, Yuntao
    2024 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2024), 2024, : 8594 - 8598
  • [33] SemAudio: Semantic-Aware Streaming Communications for Real-Time Audio Transmission
    Wei, Hao
    Xu, Wenjun
    Wang, Fengyu
    Du, Xin
    Zhang, Tiankui
    Zhang, Ping
    2022 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM 2022), 2022, : 3965 - 3970
  • [34] Semantic-Aware Real-Time Correlation Tracking Framework for UAV Videos
    Xue, Xizhe
    Li, Ying
    Yin, Xiaoyue
    Shang, Changjing
    Peng, Taoxin
    Shen, Qiang
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (04) : 2418 - 2429
  • [35] Cross-Modal Adaptive Interaction Network for RGB-D Saliency Detection
    Du, Qinsheng
    Bian, Yingxu
    Wu, Jianyu
    Zhang, Shiyan
    Zhao, Jian
    APPLIED SCIENCES-BASEL, 2024, 14 (17):
  • [36] Cross-Modal Contrastive Learning for Domain Adaptation in 3D Semantic Segmentation
    Xing, Bowei
    Ying, Xianghua
    Wang, Ruibin
    Yang, Jinfa
    Chen, Taiyan
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 2974 - 2982
  • [37] Real-Time 3D Object Detection and Recognition using a Smartphone
    Chen, Jin
    Zhu, Zhigang
    IMPROVE: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON IMAGE PROCESSING AND VISION ENGINEERING, 2022, : 158 - 165
  • [38] REAL-TIME 3D OBJECT TRACKING
    STEPHENS, RS
    IMAGE AND VISION COMPUTING, 1990, 8 (01) : 91 - 96
  • [39] CPG3D: Cross-Modal Priors Guided 3D Object Reconstruction
    Nie, Weizhi
    Jiao, Chuanqi
    Chang, Rihao
    Qu, Lei
    Liu, An-An
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9383 - 9396
  • [40] AFMCT: adaptive fusion module based on cross-modal transformer block for 3D object detection
    Bingli Zhang
    Yixin Wang
    Chengbiao Zhang
    Junzhao Jiang
    Zehao Pan
    Jin Cheng
    Yangyang Zhang
    Xinyu Wang
    Chenglei Yang
    Yanhui Wang
    Machine Vision and Applications, 2024, 35