Efficient distributional reinforcement learning with Kullback-Leibler divergence regularization

被引:0
|
作者
Renxing Li
Zhiwei Shang
Chunhua Zheng
Huiyun Li
Qing Liang
Yunduan Cui
机构
[1] School of Software Engineering,Shenzhen Institute of Advanced Technology (SIAT)
[2] University of Science and Technology of China,Department of Automation
[3] Chinese Academy of Sciences,undefined
[4] University of Science and Technology of China,undefined
来源
Applied Intelligence | 2023年 / 53卷
关键词
Distributional reinforcement learning; Sample efficiency; KL divergence regularization;
D O I
暂无
中图分类号
学科分类号
摘要
In this article, we address the issues of stability and data-efficiency in reinforcement learning (RL). A novel RL approach, Kullback-Leibler divergence-regularized distributional RL (KL-C51) is proposed to integrate the advantages of both stability in the distributional RL and data-efficiency in the Kullback-Leibler (KL) divergence-regularized RL in one framework. KL-C51 derived the Bellman equation and the TD errors regularized by KL divergence in a distributional perspective and explored the approximated strategies of properly mapping the corresponding Boltzmann softmax term into distributions. Evaluated not only by several benchmark tasks with different complexity from OpenAI Gym but also by six Atari 2600 games from the Arcade Learning Environment, the proposed method clearly illustrates the positive effect of the KL divergence regularization to the distributional RL including exclusive exploration behaviors and smooth value function update, and demonstrates an improvement in both learning stability and data-efficiency compared with other related baseline approaches.
引用
收藏
页码:24847 / 24863
页数:16
相关论文
共 50 条
  • [21] A decision cognizant Kullback-Leibler divergence
    Ponti, Moacir
    Kittler, Josef
    Riva, Mateus
    de Campos, Teofilo
    Zor, Cemre
    PATTERN RECOGNITION, 2017, 61 : 470 - 478
  • [22] AN INVOLUTION INEQUALITY FOR THE KULLBACK-LEIBLER DIVERGENCE
    Pinelis, Iosif
    MATHEMATICAL INEQUALITIES & APPLICATIONS, 2017, 20 (01): : 233 - 235
  • [23] Distributions of the Kullback-Leibler divergence with applications
    Belov, Dmitry I.
    Armstrong, Ronald D.
    BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 2011, 64 (02): : 291 - 309
  • [24] Model Fusion with Kullback-Leibler Divergence
    Claici, Sebastian
    Yurochkin, Mikhail
    Ghosh, Soumya
    Solomon, Justin
    25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [25] A generalization of the Kullback-Leibler divergence and its properties
    Yamano, Takuya
    JOURNAL OF MATHEMATICAL PHYSICS, 2009, 50 (04)
  • [26] Computation of Kullback-Leibler Divergence in Bayesian Networks
    Moral, Serafin
    Cano, Andres
    Gomez-Olmedo, Manuel
    ENTROPY, 2021, 23 (09)
  • [27] Learning Optimal Policies in Mean Field Models with Kullback-Leibler Regularization
    Busic, Ana
    Meyn, Sean
    Cammardella, Neil
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 38 - 45
  • [28] Kullback-Leibler Divergence Estimation of Continuous Distributions
    Perez-Cruz, Fernando
    2008 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY PROCEEDINGS, VOLS 1-6, 2008, : 1666 - 1670
  • [29] Generalization of the Kullback-Leibler divergence in the Tsallis statistics
    Huang, Juntao
    Yong, Wen-An
    Hong, Liu
    JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2016, 436 (01) : 501 - 512
  • [30] Kullback-Leibler Divergence for Nonnegative Matrix Factorization
    Yang, Zhirong
    Zhang, He
    Yuan, Zhijian
    Oja, Erkki
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2011, PT I, 2011, 6791 : 250 - 257