Efficient distributional reinforcement learning with Kullback-Leibler divergence regularization

被引:0
|
作者
Renxing Li
Zhiwei Shang
Chunhua Zheng
Huiyun Li
Qing Liang
Yunduan Cui
机构
[1] School of Software Engineering,Shenzhen Institute of Advanced Technology (SIAT)
[2] University of Science and Technology of China,Department of Automation
[3] Chinese Academy of Sciences,undefined
[4] University of Science and Technology of China,undefined
来源
Applied Intelligence | 2023年 / 53卷
关键词
Distributional reinforcement learning; Sample efficiency; KL divergence regularization;
D O I
暂无
中图分类号
学科分类号
摘要
In this article, we address the issues of stability and data-efficiency in reinforcement learning (RL). A novel RL approach, Kullback-Leibler divergence-regularized distributional RL (KL-C51) is proposed to integrate the advantages of both stability in the distributional RL and data-efficiency in the Kullback-Leibler (KL) divergence-regularized RL in one framework. KL-C51 derived the Bellman equation and the TD errors regularized by KL divergence in a distributional perspective and explored the approximated strategies of properly mapping the corresponding Boltzmann softmax term into distributions. Evaluated not only by several benchmark tasks with different complexity from OpenAI Gym but also by six Atari 2600 games from the Arcade Learning Environment, the proposed method clearly illustrates the positive effect of the KL divergence regularization to the distributional RL including exclusive exploration behaviors and smooth value function update, and demonstrates an improvement in both learning stability and data-efficiency compared with other related baseline approaches.
引用
收藏
页码:24847 / 24863
页数:16
相关论文
共 50 条
  • [31] Acoustic environment identification by Kullback-Leibler divergence
    Delgado-Gutierrez, G.
    Rodriguez-Santos, F.
    Jimenez-Ramirez, O.
    Vazquez-Medina, R.
    FORENSIC SCIENCE INTERNATIONAL, 2017, 281 : 134 - 140
  • [32] Minimization of the Kullback-Leibler Divergence for Nonlinear Estimation
    Darling, Jacob E.
    DeMars, Kyle J.
    JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2017, 40 (07) : 1739 - 1748
  • [33] Modulation Classification Based on Kullback-Leibler Divergence
    Im, Chaewon
    Ahn, Seongjin
    Yoon, Dongweon
    15TH INTERNATIONAL CONFERENCE ON ADVANCED TRENDS IN RADIOELECTRONICS, TELECOMMUNICATIONS AND COMPUTER ENGINEERING (TCSET - 2020), 2020, : 373 - 376
  • [34] MINIMIZATION OF THE KULLBACK-LEIBLER DIVERGENCE FOR NONLINEAR ESTIMATION
    Darling, Jacob E.
    DeMars, Kyle J.
    ASTRODYNAMICS 2015, 2016, 156 : 213 - 232
  • [35] COMPLEX NMF WITH THE GENERALIZED KULLBACK-LEIBLER DIVERGENCE
    Kameoka, Hirokazu
    Kagami, Hideaki
    Yukawa, Masahiro
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 56 - 60
  • [36] The generalized Kullback-Leibler divergence and robust inference
    Park, C
    Basu, A
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2003, 73 (05) : 311 - 332
  • [37] On calibration of Kullback-Leibler divergence via prediction
    Keyes, TK
    Levy, MS
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1999, 28 (01) : 67 - 85
  • [38] Estimation of kullback-leibler divergence by local likelihood
    Lee, Young Kyung
    Park, Byeong U.
    ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2006, 58 (02) : 327 - 340
  • [39] The AIC criterion and symmetrizing the Kullback-Leibler divergence
    Seghouane, Abd-Krim
    Amari, Shun-Ichi
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2007, 18 (01): : 97 - 106
  • [40] Trust region policy optimization via entropy regularization for Kullback-Leibler divergence constraint
    Xu, Haotian
    Xuan, Junyu
    Zhang, Guangquan
    Lu, Jie
    NEUROCOMPUTING, 2024, 589