Efficient distributional reinforcement learning with Kullback-Leibler divergence regularization

被引:0
|
作者
Renxing Li
Zhiwei Shang
Chunhua Zheng
Huiyun Li
Qing Liang
Yunduan Cui
机构
[1] School of Software Engineering,Shenzhen Institute of Advanced Technology (SIAT)
[2] University of Science and Technology of China,Department of Automation
[3] Chinese Academy of Sciences,undefined
[4] University of Science and Technology of China,undefined
来源
Applied Intelligence | 2023年 / 53卷
关键词
Distributional reinforcement learning; Sample efficiency; KL divergence regularization;
D O I
暂无
中图分类号
学科分类号
摘要
In this article, we address the issues of stability and data-efficiency in reinforcement learning (RL). A novel RL approach, Kullback-Leibler divergence-regularized distributional RL (KL-C51) is proposed to integrate the advantages of both stability in the distributional RL and data-efficiency in the Kullback-Leibler (KL) divergence-regularized RL in one framework. KL-C51 derived the Bellman equation and the TD errors regularized by KL divergence in a distributional perspective and explored the approximated strategies of properly mapping the corresponding Boltzmann softmax term into distributions. Evaluated not only by several benchmark tasks with different complexity from OpenAI Gym but also by six Atari 2600 games from the Arcade Learning Environment, the proposed method clearly illustrates the positive effect of the KL divergence regularization to the distributional RL including exclusive exploration behaviors and smooth value function update, and demonstrates an improvement in both learning stability and data-efficiency compared with other related baseline approaches.
引用
收藏
页码:24847 / 24863
页数:16
相关论文
共 50 条
  • [41] A Novel Multiple Kernel Learning Method Based on the Kullback-Leibler Divergence
    Liang, Zhizheng
    Zhang, Lei
    Liu, Jin
    NEURAL PROCESSING LETTERS, 2015, 42 (03) : 745 - 762
  • [42] Clustering with multilayer perceptrons and Hebbian learning based on Kullback-Leibler divergence
    Filho, JRM
    Bezerra, MA
    Oliveira, LP
    MACHINE LEARNING FOR SIGNAL PROCESSING XIV, 2004, : 243 - 252
  • [43] Efficient language model adaptation with Noise Contrastive Estimation and Kullback-Leibler regularization
    Andres-Ferrer, Jesus
    Bodenstab, Nathan
    Vozila, Paul
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3368 - 3372
  • [44] Average Kullback-Leibler Divergence for Random Finite Sets
    Battistelli, Giorgio
    Chisci, Luigi
    Fantacci, Claudio
    Farina, Alfonso
    Vo, Ba-Ngu
    2015 18TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2015, : 1359 - 1366
  • [45] Kullback-Leibler divergence -based Improved Particle Filter
    Mansouri, Majdi
    Nounou, Hazem
    Nounou, Mohamed
    2014 11TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2014,
  • [47] Kullback-Leibler quantum divergence as an indicator of quantum chaos
    Kowalewska-Kudlaszyk, A.
    Kalaga, J. K.
    Leonski, W.
    Long, V. Cao
    PHYSICS LETTERS A, 2012, 376 (15) : 1280 - 1286
  • [48] DETECTING RARE EVENTS USING KULLBACK-LEIBLER DIVERGENCE
    Xu, Jingxin
    Denman, Simon
    Fookes, Clinton
    Sridharan, Sridha
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 1305 - 1309
  • [49] Damage detection using the improved Kullback-Leibler divergence
    Tian, Shaohua
    Chen, Xuefeng
    Yang, Zhibo
    He, Zhengjia
    Zhang, Xingwu
    STRUCTURAL ENGINEERING AND MECHANICS, 2013, 48 (03) : 291 - 308
  • [50] Guaranteed Bounds on the Kullback-Leibler Divergence of Univariate Mixtures
    Nielsen, Frank
    Sun, Ke
    IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (11) : 1543 - 1546