Memetic Evolution Strategy for Reinforcement Learning

被引:0
|
作者
Qu, Xinghua [1 ]
Ong, Yew-Soon [1 ]
Hou, Yaqing [2 ]
Shen, Xiaobo [3 ]
机构
[1] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore
[2] Dalian Univ Technol, Sch Comp Sci & Technol, Dalian, Peoples R China
[3] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing, Jiangsu, Peoples R China
关键词
reinforcement learning; memetic algorithm; evolution strategy; Q learning; NEUROEVOLUTION; GAME; GO;
D O I
10.1109/cec.2019.8789935
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Neuroevolution (i.e., training neural network with Evolution Computation) has successfully unfolded a range of challenging reinforcement learning (RL) tasks. However, existing neuroevolution methods suffer from high sample complexity, as the black-box evaluations (i.e., accumulated rewards of complete Markov Decision Processes (MDPs)) discard bunches of temporal frames (i.e., time-step data instances in MDP). Actually, these temporal frames hold the Markov property of the problem, that benefits the training of neural network as well by temporal difference (TD) learning. In this paper, we propose a memetic reinforcement learning (MRL) framework that optimizes the RL agent by leveraging both black-box evaluations and temporal frames. To this end, an evolution strategy (ES) is associated with Q learning, where ES provides diversified frames to globally train the agent, and Q learning locally exploits the Markov property within frames to refresh the agent. Therefore, MRL conveys a novel memetic framework that allows evaluation free local search by Q learning. Experiments on classical control problem verify the efficiency of the proposed MRL, that achieves significantly faster convergence than canonical ES.
引用
收藏
页码:1922 / 1928
页数:7
相关论文
共 50 条
  • [31] Combining a gradient-based method and an evolution strategy for multi-objective reinforcement learning
    Chen, Diqi
    Wang, Yizhou
    Gao, Wen
    [J]. APPLIED INTELLIGENCE, 2020, 50 (10) : 3301 - 3317
  • [32] Combining a gradient-based method and an evolution strategy for multi-objective reinforcement learning
    Diqi Chen
    Yizhou Wang
    Wen Gao
    [J]. Applied Intelligence, 2020, 50 : 3301 - 3317
  • [33] Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning
    Hu, Xiao
    Wang, Tianshu
    Gong, Min
    Yang, Shaoshi
    [J]. IEEE ACCESS, 2024, 12 : 48210 - 48222
  • [34] Evolution and learning in an intrinsically motivated reinforcement learning robot
    Schembri, Massimiliano
    Mirolli, Marco
    Baldassarre, Gianhica
    [J]. ADVANCES IN ARTIFICIAL LIFE, PROCEEDINGS, 2007, 4648 : 294 - +
  • [35] An Adaptive Memetic Algorithm Using a Synergy of Differential Evolution and Learning Automata
    Sengupta, Abhronil
    Chakraborti, Tathagata
    Konar, Amit
    Kim, Eunjin
    Nagar, Atulya K.
    [J]. 2012 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2012,
  • [36] Self-Augmenting Strategy for Reinforcement Learning
    Huang, Xin
    Xiao, Shuangjiu
    [J]. PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (CSAI 2017), 2017, : 1 - 4
  • [37] Influence zones:: A strategy to enhance reinforcement learning
    Braga, Arthur Plinio de S.
    Araujo, Aluizio F. R.
    [J]. NEUROCOMPUTING, 2006, 70 (1-3) : 21 - 34
  • [38] A stochastic exploration strategy for satisficing reinforcement learning
    Katayama, S
    Kobayashi, S
    [J]. INTELLIGENT AUTONOMOUS SYSTEMS: IAS-5, 1998, : 296 - 303
  • [39] Container Scaling Strategy Based on Reinforcement Learning
    Wang, Huaijun
    Zhang, Chenfei
    Li, Junhuai
    Bao, Dan
    Xu, Jiang
    [J]. Security and Communication Networks, 2023, 2023
  • [40] Enhancing A Stock Timing Strategy by Reinforcement Learning
    Li, Yaoming
    Chen, Yun
    [J]. IAENG International Journal of Computer Science, 2021, 48 (04)