Distributional Deep Reinforcement Learning with a Mixture of Gaussians

被引：0

作者：

Choi, Yunho ^{[1
,2
]}

Lee, Kyungjae ^{[1
,2
]}

Oh, Songhwai ^{[1
,2
]}

机构：

[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 08826, South Korea

[2] Seoul Natl Univ, ASRI, Seoul 08826, South Korea

来源：

2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA) | 2019年

基金：

新加坡国家研究基金会;

关键词：

D O I：

10.1109/icra.2019.8793505

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we propose a novel distributional reinforcement learning (RL) method which models the distribution of the sum of rewards using a mixture density network. Recently, it has been shown that modeling the randomness of the return distribution leads to better performance in Atari games and control tasks. Despite the success of the prior work, it has limitations which come from the use of a discrete distribution. First, it needs a projection step and softmax parametrization for the distribution, since it minimizes the KL divergence loss. Secondly, its performance depends on discretization hyperparameters such as the number of atoms and bounds of the support which require domain knowledge. We mitigate these problems with the proposed parameterization, a mixture of Gaussians. Furthermore, we propose a new distance metric called the Jensen-Tsallis distance, which allows the computation of the distance between two mixtures of Gaussians in a closed form. We have conducted various experiments to validate the proposed method, including Atari games and autonomous vehicle driving.

引用

页码：9791 / 9797

页数：7

共 50 条

[21] Statistics and Samples in Distributional Reinforcement Learning
Rowland, Mark
Dadashi, Robert
Kumar, Saurabh
Munos, Remi
Bellemare, Marc G.
Dabney, Will
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[22] Distributional reinforcement learning in prefrontal cortex
Timothy H. Muller
James L. Butler
Sebastijan Veselic
Bruno Miranda
Joni D. Wallis
Peter Dayan
Timothy E. J. Behrens
Zeb Kurth-Nelson
Steven W. Kennerley
Nature Neuroscience, 2024, 27 : 403 - 408
[23] Conservative Offline Distributional Reinforcement Learning
Ma, Yecheng Jason
Jayaraman, Dinesh
Bastani, Osbert
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[24] An Analysis of Categorical Distributional Reinforcement Learning
Rowland, Mark
Bellemare, Marc G.
Dabney, Will
Munos, Remi
Teh, Yee Whye
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
[25] Distributional reinforcement learning in prefrontal cortex
Muller, Timothy H.
Butler, James L.
Veselic, Sebastijan
Miranda, Bruno
Wallis, Joni D.
Dayan, Peter
Behrens, Timothy E. J.
Kurth-Nelson, Zeb
Kennerley, Steven W.
NATURE NEUROSCIENCE, 2024, 27 (03) : 403 - 408
[26] Distributional Reward Decomposition for Reinforcement Learning
Lin, Zichuan
Zhao, Li
Yang, Derek
Qin, Tao
Yang, Guangwen
Liu, Tie-Yan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[27] Learning Structured Gaussians to Approximate Deep Ensembles
Simpson, Ivor J. A.
Vicente, Sara
Campbell, Neill D. F.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 366 - 374
[28] GAN-Powered Deep Distributional Reinforcement Learning for Resource Management in Network Slicing
Hua, Yuxiu
Li, Rongpeng
Zhao, Zhifeng
Chen, Xianfu
Zhang, Honggang
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2020, 38 (02) : 334 - 349
[29] A fully value distributional deep reinforcement learning framework for multi-agent cooperation
Fu, Mingsheng
Huang, Liwei
Li, Fan
Qu, Hong
Xu, Chengzhong
NEURAL NETWORKS, 2025, 184
[30] Prioritized experience replay based deep distributional reinforcement learning for battery operation in microgrids
Panda, Deepak Kumar
Turner, Oliver
Das, Saptarshi
Abusara, Mohammad
JOURNAL OF CLEANER PRODUCTION, 2024, 434

← 1 2 3 4 5 →