Efficient distributional reinforcement learning with Kullback-Leibler divergence regularization

被引:0
|
作者
Renxing Li
Zhiwei Shang
Chunhua Zheng
Huiyun Li
Qing Liang
Yunduan Cui
机构
[1] School of Software Engineering,Shenzhen Institute of Advanced Technology (SIAT)
[2] University of Science and Technology of China,Department of Automation
[3] Chinese Academy of Sciences,undefined
[4] University of Science and Technology of China,undefined
来源
Applied Intelligence | 2023年 / 53卷
关键词
Distributional reinforcement learning; Sample efficiency; KL divergence regularization;
D O I
暂无
中图分类号
学科分类号
摘要
In this article, we address the issues of stability and data-efficiency in reinforcement learning (RL). A novel RL approach, Kullback-Leibler divergence-regularized distributional RL (KL-C51) is proposed to integrate the advantages of both stability in the distributional RL and data-efficiency in the Kullback-Leibler (KL) divergence-regularized RL in one framework. KL-C51 derived the Bellman equation and the TD errors regularized by KL divergence in a distributional perspective and explored the approximated strategies of properly mapping the corresponding Boltzmann softmax term into distributions. Evaluated not only by several benchmark tasks with different complexity from OpenAI Gym but also by six Atari 2600 games from the Arcade Learning Environment, the proposed method clearly illustrates the positive effect of the KL divergence regularization to the distributional RL including exclusive exploration behaviors and smooth value function update, and demonstrates an improvement in both learning stability and data-efficiency compared with other related baseline approaches.
引用
收藏
页码:24847 / 24863
页数:16
相关论文
共 50 条
  • [1] Efficient distributional reinforcement learning with Kullback-Leibler divergence regularization
    Li, Renxing
    Shang, Zhiwei
    Zheng, Chunhua
    Li, Huiyun
    Liang, Qing
    Cui, Yunduan
    APPLIED INTELLIGENCE, 2023, 53 (21) : 24847 - 24863
  • [2] Optimistic reinforcement learning by forward Kullback-Leibler divergence optimization
    Kobayashi, Taisuke
    NEURAL NETWORKS, 2022, 152 : 169 - 180
  • [3] General Munchausen Reinforcement Learning with Tsallis Kullback-Leibler Divergence
    Zhu, Lingwei
    Chen, Zheng
    Schlegel, Matthew
    White, Martha
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [4] Kullback-Leibler Divergence Metric Learning
    Ji, Shuyi
    Zhang, Zizhao
    Ying, Shihui
    Wang, Liejun
    Zhao, Xibin
    Gao, Yue
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (04) : 2047 - 2058
  • [5] Probabilistic Forecast Reconciliation with Kullback-Leibler Divergence Regularization
    Zhang, Guanyu
    Li, Feng
    Kang, Yanfei
    2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 601 - 607
  • [6] Renyi Divergence and Kullback-Leibler Divergence
    van Erven, Tim
    Harremoes, Peter
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2014, 60 (07) : 3797 - 3820
  • [7] The fractional Kullback-Leibler divergence
    Alexopoulos, A.
    JOURNAL OF PHYSICS A-MATHEMATICAL AND THEORETICAL, 2021, 54 (07)
  • [8] BOUNDS FOR KULLBACK-LEIBLER DIVERGENCE
    Popescu, Pantelimon G.
    Dragomir, Sever S.
    Slusanschi, Emil I.
    Stanasila, Octavian N.
    ELECTRONIC JOURNAL OF DIFFERENTIAL EQUATIONS, 2016,
  • [9] On the Interventional Kullback-Leibler Divergence
    Wildberger, Jonas
    Guo, Siyuan
    Bhattacharyya, Arnab
    Schoelkopf, Bernhard
    CONFERENCE ON CAUSAL LEARNING AND REASONING, VOL 213, 2023, 213 : 328 - 349
  • [10] Kullback-Leibler Divergence Revisited
    Raiber, Fiana
    Kurland, Oren
    ICTIR'17: PROCEEDINGS OF THE 2017 ACM SIGIR INTERNATIONAL CONFERENCE THEORY OF INFORMATION RETRIEVAL, 2017, : 117 - 124