A Sandpile Model for Reliable Actor-Critic Reinforcement Learning

被引:0
|
作者
Peng, Yiming [1 ]
Chen, Gang [1 ]
Zhang, Mengjie [1 ]
Pang, Shaoning [2 ]
机构
[1] Victoria Univ Wellington, Sch Engn & Comp Sci, Wellington, New Zealand
[2] Unitec Inst Technol, Dept Comp, Auckland, New Zealand
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Actor-Critic algorithms have been increasingly researched for tackling challenging reinforcement learning problems. These algorithms are usually composed of two distinct learning processes, namely actor (a.k.a, policy) learning and critic (a.k.a, value function) learning. Actor learning is heavily dependent on critic learning; particularly unreliable critic learning due to its divergence can significantly affect the effectiveness of actor-critic algorithms. To address this issue, many successful algorithms have been developed recently with the aim of improving the accuracy of value function approximation. However, these algorithms introduce extra complexities to the learning process and may actually increase the difficulty for effective learning. Thus, in this research, we consider a simpler approach to improving the critic learning reliability. This approach requires us to seamlessly integrate an adapted Sandpile Model with the critic learning process so as to achieve desirable self-organizing property for reliable critic learning. Following this approach, we propose a new actor-critic learning algorithm. Its effectiveness and learning reliability have been further evaluated experimentally. As strongly demonstrated in the experiment results, our new algorithm can perform much better than traditional actor-critic algorithms. Meanwhile, correlation analysis further suggests that a strong correlation exists in between learning reliability and effectiveness. This finding may be important for future development of powerful reinforcement learning algorithms.
引用
收藏
页码:4014 / 4021
页数:8
相关论文
共 50 条
  • [1] A World Model for Actor-Critic in Reinforcement Learning
    Panov, A. I.
    Ugadiarov, L. A.
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, 2023, 33 (03) : 467 - 477
  • [2] An Actor-Critic Hierarchical Reinforcement Learning Model for Course Recommendation
    Liang, Kun
    Zhang, Guoqiang
    Guo, Jinhui
    Li, Wentao
    [J]. ELECTRONICS, 2023, 12 (24)
  • [3] Actor-Critic based Improper Reinforcement Learning
    Zaki, Mohammadi
    Mohan, Avinash
    Gopalan, Aditya
    Mannor, Shie
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [4] Curious Hierarchical Actor-Critic Reinforcement Learning
    Roeder, Frank
    Eppe, Manfred
    Nguyen, Phuong D. H.
    Wermter, Stefan
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II, 2020, 12397 : 408 - 419
  • [5] Integrated Actor-Critic for Deep Reinforcement Learning
    Zheng, Jiaohao
    Kurt, Mehmet Necip
    Wang, Xiaodong
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV, 2021, 12894 : 505 - 518
  • [6] A fuzzy Actor-Critic reinforcement learning network
    Wang, Xue-Song
    Cheng, Yu-Hu
    Yi, Jian-Qiang
    [J]. INFORMATION SCIENCES, 2007, 177 (18) : 3764 - 3781
  • [7] A modified actor-critic reinforcement learning algorithm
    Mustapha, SM
    Lachiver, G
    [J]. 2000 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CONFERENCE PROCEEDINGS, VOLS 1 AND 2: NAVIGATING TO A NEW ERA, 2000, : 605 - 609
  • [8] Reinforcement actor-critic learning as a rehearsal in MicroRTS
    Manandhar, Shiron
    Banerjee, Bikramjit
    [J]. Knowledge Engineering Review, 2024, 39
  • [9] Research on actor-critic reinforcement learning in RoboCup
    Guo, He
    Liu, Tianying
    Wang, Yuxin
    Chen, Feng
    Fan, Jianming
    [J]. WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 205 - 205
  • [10] Multi-actor mechanism for actor-critic reinforcement learning
    Li, Lin
    Li, Yuze
    Wei, Wei
    Zhang, Yujia
    Liang, Jiye
    [J]. INFORMATION SCIENCES, 2023, 647