Adaptive Natural Policy Gradient in Reinforcement Learning

被引:0
|
作者
Li, Dazi [1 ]
Qiao, Zengyuan [1 ]
Song, Tianheng [1 ]
Jin, Qibing [1 ]
机构
[1] Beijing Univ Chem Technol, Inst Automat, Beijing 100190, Peoples R China
来源
PROCEEDINGS OF 2018 IEEE 7TH DATA DRIVEN CONTROL AND LEARNING SYSTEMS CONFERENCE (DDCLS) | 2018年
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
Actor-Critic; natural gradient; adadelta; adaptive; value function approximation; reinforcement learning;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In recent years, the policy gradient method in intensive learning has attracted wide attention with its good convergence performance. At the same time, regulation of hyper parameters is also a matter of concern. Based on the advantages of Actor-Critic structure (AC), the Natural-Gradient Actor-Critic algorithm (NAC) in the discount model is studied in this article. Then the Natural-Gradient Actor-Critic with ADADELTA (A-NAC) algorithm is proposed. The use of ADADELTA is adapted to adjust the learning rate in the actor network, and further improves the convergence speed of the NAC algorithm. Simulation results show that NAC/A-NAC have better learning efficiency and faster convergence rate than regular gradient AC methods.
引用
收藏
页码:605 / 610
页数:6
相关论文
共 50 条
  • [1] Reinforcement Learning With Adaptive Policy Gradient Transfer Across Heterogeneous Problems
    Zhang, Gengzhi
    Feng, Liang
    Wang, Yu
    Li, Min
    Xie, Hong
    Tan, Kay Chen
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (03): : 2213 - 2227
  • [2] Natural policy gradient reinforcement learning for a CPG control of a biped robot
    Nakamura, Y
    Mori, T
    Ishii, S
    PARALLEL PROBLEM SOLVING FROM NATURE - PPSN VIII, 2004, 3242 : 972 - 981
  • [3] Policy gradient fuzzy reinforcement learning
    Wang, XN
    Xu, X
    He, HG
    PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 992 - 995
  • [4] Reward-Adaptive Reinforcement Learning: Dynamic Policy Gradient Optimization for Bipedal Locomotion
    Huang, Changxin
    Wang, Guangrun
    Zhou, Zhibo
    Zhang, Ronghui
    Lin, Liang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7686 - 7695
  • [5] A modification of gradient policy in reinforcement learning procedure
    Abas, Marcel
    Skripcak, Tomas
    2012 15TH INTERNATIONAL CONFERENCE ON INTERACTIVE COLLABORATIVE LEARNING (ICL), 2012,
  • [6] Policy Gradient Method For Robust Reinforcement Learning
    Wang, Yue
    Zou, Shaofeng
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [7] Reinforcement Learning to Rank with Pairwise Policy Gradient
    Xu, Jun
    Wei, Zeng
    Xia, Long
    Lan, Yanyan
    Yin, Dawei
    Cheng, Xueqi
    Wen, Ji-Rong
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 509 - 518
  • [8] Scalable Multitask Policy Gradient Reinforcement Learning
    El Bsat, Salam
    Ammar, Haitham Bou
    Taylor, Matthew E.
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1847 - 1853
  • [9] A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning
    Kim, Dong-Ki
    Liu, Miao
    Riemer, Matthew
    Sun, Chuangchuang
    Abdulhai, Marwa
    Habibi, Golnaz
    Lopez-Cot, Sebastian
    Tesauro, Gerald
    How, Jonathan P.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [10] Adaptive Playouts in Monte-Carlo Tree Search with Policy-Gradient Reinforcement Learning
    Graf, Tobias
    Platzner, Marco
    ADVANCES IN COMPUTER GAMES, ACG 2015, 2015, 9525 : 1 - 11