Multi-modal policy fusion for end-to-end autonomous driving

被引:15
|
作者
Huang, Zhenbo [1 ]
Sun, Shiliang [1 ]
Zhao, Jing [1 ]
Mao, Liang [1 ]
机构
[1] East China Normal Univ, Sch Comp Sci & Technol, 3663 North Zhongshan Rd, Shanghai 200062, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-modal policy fusion; Autonomous driving; Reinforcement learning; Robust fused policy;
D O I
10.1016/j.inffus.2023.101834
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-modal learning has made impressive progress in autonomous driving by leveraging information from multiple sensors. Existing feature fusion methods make decisions by integrating perceptions from different sensors. However, autonomous driving systems could be risky since the fused feature are unreliable when one of the sensors fails. Moreover, these methods require either sophisticated geometric designs to align features or complex neural networks to effectively fuse features, significantly increasing the training cost. In this paper, we propose PolicyFuser, a policy fusion method for end-to-end autonomous driving to address these issues. PolicyFuser retains an independent decision for each sensor, and no feature alignment or complex neural networks are required. To focus on the best policy, we use reinforcement learning to select the action with the highest Q-value as the primary decision, and the remaining actions as the secondary decisions. Then the secondary decisions are used to fine-tune the primary decision through a primary and secondary policy fusion (PSF) module. To bridge the gap between the decisions from different sensors and improve the stability of policy fusion, we use a conditional variational autoencoder (CVAE) to generate pseudo-expert decisions. We demonstrate the effectiveness of our method in CARLA, and our method achieves the highest driving scores and handles sensor failures with excellence.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving
    Prakash, Aditya
    Chitta, Kashyap
    Geiger, Andreas
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 7073 - 7083
  • [2] CrossFuser: Multi-Modal Feature Fusion for End-to-End Autonomous Driving Under Unseen Weather Conditions
    Wu, Weishang
    Deng, Xiaoheng
    Jiang, Ping
    Wan, Shaohua
    Guo, Yuanxiong
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (12) : 14378 - 14392
  • [3] Multi-Modal Sensor Fusion-Based Deep Neural Network for End-to-End Autonomous Driving With Scene Understanding
    Huang, Zhiyu
    Lv, Chen
    Xing, Yang
    Wu, Jingda
    [J]. IEEE SENSORS JOURNAL, 2021, 21 (10) : 11781 - 11790
  • [4] Multi-Modal Fusion for End-to-End RGB-T Tracking
    Zhang, Lichao
    Danelljan, Martin
    Gonzalez-Garcia, Abel
    van de Weijer, Joost
    Khan, Fahad Shahbaz
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 2252 - 2261
  • [5] MMFN: Multi-Modal-Fusion-Net for End-to-End Driving
    Zhang, Qingwen
    Tang, Mingkai
    Geng, Ruoyu
    Chen, Feiyi
    Xin, Ren
    Wang, Lujia
    [J]. 2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 8638 - 8643
  • [6] SymmetricNet: end-to-end mesoscale eddy detection with multi-modal data fusion
    Zhao, Yuxiao
    Fan, Zhenlin
    Li, Haitao
    Zhang, Rui
    Xiang, Wei
    Wang, Shengke
    Zhong, Guoqiang
    [J]. FRONTIERS IN MARINE SCIENCE, 2023, 10
  • [7] Multi-Modal Data Augmentation for End-to-End ASR
    Renduchintala, Adithya
    Ding, Shuoyang
    Wiesner, Matthew
    Watanabe, Shinji
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2394 - 2398
  • [8] End-to-end Knowledge Retrieval with Multi-modal Queries
    Luo, Man
    Fang, Zhiyuan
    Gokhale, Tejas
    Yang, Yezhou
    Baral, Chitta
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 8573 - 8589
  • [9] End-to-end Multi-modal Video Temporal Grounding
    Chen, Yi-Wen
    Tsai, Yi-Hsuan
    Yang, Ming-Hsuan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [10] End-to-End Compound Table Understanding with Multi-Modal Modeling
    Li, Zaisheng
    Li, Yi
    Liang, Qiao
    Li, Pengfei
    Cheng, Zhanzhan
    Niu, Yi
    Pu, Shiliang
    Li, Xi
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4112 - 4121