Distributional Deep Reinforcement Learning with a Mixture of Gaussians

被引:0
|
作者
Choi, Yunho [1 ,2 ]
Lee, Kyungjae [1 ,2 ]
Oh, Songhwai [1 ,2 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 08826, South Korea
[2] Seoul Natl Univ, ASRI, Seoul 08826, South Korea
来源
2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA) | 2019年
基金
新加坡国家研究基金会;
关键词
D O I
10.1109/icra.2019.8793505
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a novel distributional reinforcement learning (RL) method which models the distribution of the sum of rewards using a mixture density network. Recently, it has been shown that modeling the randomness of the return distribution leads to better performance in Atari games and control tasks. Despite the success of the prior work, it has limitations which come from the use of a discrete distribution. First, it needs a projection step and softmax parametrization for the distribution, since it minimizes the KL divergence loss. Secondly, its performance depends on discretization hyperparameters such as the number of atoms and bounds of the support which require domain knowledge. We mitigate these problems with the proposed parameterization, a mixture of Gaussians. Furthermore, we propose a new distance metric called the Jensen-Tsallis distance, which allows the computation of the distance between two mixtures of Gaussians in a closed form. We have conducted various experiments to validate the proposed method, including Atari games and autonomous vehicle driving.
引用
收藏
页码:9791 / 9797
页数:7
相关论文
共 50 条
  • [21] Statistics and Samples in Distributional Reinforcement Learning
    Rowland, Mark
    Dadashi, Robert
    Kumar, Saurabh
    Munos, Remi
    Bellemare, Marc G.
    Dabney, Will
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [22] Distributional reinforcement learning in prefrontal cortex
    Timothy H. Muller
    James L. Butler
    Sebastijan Veselic
    Bruno Miranda
    Joni D. Wallis
    Peter Dayan
    Timothy E. J. Behrens
    Zeb Kurth-Nelson
    Steven W. Kennerley
    Nature Neuroscience, 2024, 27 : 403 - 408
  • [23] Conservative Offline Distributional Reinforcement Learning
    Ma, Yecheng Jason
    Jayaraman, Dinesh
    Bastani, Osbert
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [24] An Analysis of Categorical Distributional Reinforcement Learning
    Rowland, Mark
    Bellemare, Marc G.
    Dabney, Will
    Munos, Remi
    Teh, Yee Whye
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [25] Distributional reinforcement learning in prefrontal cortex
    Muller, Timothy H.
    Butler, James L.
    Veselic, Sebastijan
    Miranda, Bruno
    Wallis, Joni D.
    Dayan, Peter
    Behrens, Timothy E. J.
    Kurth-Nelson, Zeb
    Kennerley, Steven W.
    NATURE NEUROSCIENCE, 2024, 27 (03) : 403 - 408
  • [26] Distributional Reward Decomposition for Reinforcement Learning
    Lin, Zichuan
    Zhao, Li
    Yang, Derek
    Qin, Tao
    Yang, Guangwen
    Liu, Tie-Yan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [27] Learning Structured Gaussians to Approximate Deep Ensembles
    Simpson, Ivor J. A.
    Vicente, Sara
    Campbell, Neill D. F.
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 366 - 374
  • [28] GAN-Powered Deep Distributional Reinforcement Learning for Resource Management in Network Slicing
    Hua, Yuxiu
    Li, Rongpeng
    Zhao, Zhifeng
    Chen, Xianfu
    Zhang, Honggang
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2020, 38 (02) : 334 - 349
  • [29] A fully value distributional deep reinforcement learning framework for multi-agent cooperation
    Fu, Mingsheng
    Huang, Liwei
    Li, Fan
    Qu, Hong
    Xu, Chengzhong
    NEURAL NETWORKS, 2025, 184
  • [30] Prioritized experience replay based deep distributional reinforcement learning for battery operation in microgrids
    Panda, Deepak Kumar
    Turner, Oliver
    Das, Saptarshi
    Abusara, Mohammad
    JOURNAL OF CLEANER PRODUCTION, 2024, 434