Multi-modal dataset and fusion network for simultaneous semantic segmentation of on-road dynamic objects

被引:0
|
作者
Cho, Jieun [1 ]
Ha, Jinsu [1 ]
Song, Hamin [1 ]
Jang, Sungmoon [2 ]
Jo, Kichun [3 ]
机构
[1] Konkuk Univ, Dept Smart Vehicle Engn, Seoul 05029, South Korea
[2] Hyundai Motor Co, Automot R&D Div, Seoul 06182, South Korea
[3] Hanyang Univ, Dept Automot Engn, Seoul 04763, South Korea
基金
新加坡国家研究基金会;
关键词
Semantic segmentation; Sensor fusion; Perception; Deep learning; Autonomous driving;
D O I
10.1016/j.engappai.2025.110024
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
An accurate and robust perception system is essential for autonomous vehicles to interact with various dynamic objects on the road. By applying semantic segmentation techniques to the data from the camera sensor and light detection and ranging sensor, dynamic objects can be classified at pixel and point levels respectively. However, there are challenges when using a single sensor, especially under adverse lighting conditions or with sparse point densities. To address these challenges, this paper proposes a network for simultaneous point cloud and image semantic segmentation based on sensor fusion. The proposed network adopts a modal- specific architecture to fully leverage the characteristics of sensor data and achieves geometrically accurate matching through the image, point, and voxel feature fusion module. Additionally, we introduce the dataset that provides semantic labels for synchronized images and point clouds. Experimental results show that the proposed fusion approach outperforms uni-modal based methods and demonstrates robust performance even in challenging real-world scenarios. The dataset is publicly available at https://github.com/ailab-konkuk/MultiModal-Dataset.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Adaptive Multi-modal Fusion Instance Segmentation for CAEVs in Complex Conditions: Dataset, Framework and Verifications
    Peng, Pai
    Geng, Keke
    Yin, Guodong
    Lu, Yanbo
    Zhuang, Weichao
    Liu, Shuaipeng
    CHINESE JOURNAL OF MECHANICAL ENGINEERING, 2021, 34 (01)
  • [22] Joint Segmentation and Grasp Pose Detection with Multi-Modal Feature Fusion Network
    Liu, Xiaozheng
    Zhang, Yunzhou
    Cao, He
    Shan, Dexing
    Zhao, Jiaqi
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 1751 - 1756
  • [23] A Transformer-based multi-modal fusion network for semantic segmentation of high-resolution remote sensing imagery
    Liu, Yutong
    Gao, Kun
    Wang, Hong
    Yang, Zhijia
    Wang, Pengyu
    Ji, Shijing
    Huang, Yanjun
    Zhu, Zhenyu
    Zhao, Xiaobin
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 133
  • [24] Multi-modal neural networks with multi-scale RGB-T fusion for semantic segmentation
    Lyu, Y.
    Schiopu, I.
    Munteanu, A.
    ELECTRONICS LETTERS, 2020, 56 (18) : 920 - 922
  • [25] MOFA: A novel dataset for Multi-modal Image Fusion Applications
    Xiao, Kaihua
    Kang, Xudong
    Liu, Haibo
    Duan, Puhong
    INFORMATION FUSION, 2023, 96 : 144 - 155
  • [26] An Open GMNS Dataset of a Dynamic Multi-Modal Transportation Network Model of Melbourne, Australia
    Nourmohammadi, Fatemeh
    Mansourianfar, Mohammadhadi
    Shafiei, Sajjad
    Gu, Ziyuan
    Saberi, Meead
    DATA, 2021, 6 (02) : 1 - 9
  • [27] MMNet: Multi-modal multi-stage network for RGB-T image semantic segmentation
    Xin Lan
    Xiaojing Gu
    Xingsheng Gu
    Applied Intelligence, 2022, 52 : 5817 - 5829
  • [28] MMNet: Multi-modal multi-stage network for RGB-T image semantic segmentation
    Lan, Xin
    Gu, Xiaojing
    Gu, Xingsheng
    APPLIED INTELLIGENCE, 2022, 52 (05) : 5817 - 5829
  • [29] Multi-modal unsupervised domain adaptation for semantic image segmentation
    Hu, Sijie
    Bonardi, Fabien
    Bouchafa, Samia
    Sidibe, Desire
    PATTERN RECOGNITION, 2023, 137
  • [30] Multi-modal Prototypes for Open-World Semantic Segmentation
    Yang, Yuhuan
    Ma, Chaofan
    Ju, Chen
    Zhang, Fei
    Yao, Jiangchao
    Zhang, Ya
    Wang, Yanfeng
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (12) : 6004 - 6020