An Experience Replay Method Based on Tree Structure for Reinforcement Learning

被引:1
|
作者
Jiang, Wei-Cheng [1 ]
Hwang, Kao-Shing [1 ,2 ]
Lin, Jin-Ling [3 ]
机构
[1] Natl Sun Yat Sen Univ, Dept Elect Engn, Kaohsiung 80424, Taiwan
[2] Kaohsiung Med Univ, Dept Healthcare Adm & Med Informat, Kaohsiung, Taiwan
[3] Shih Hsin Univ, Dept Informat Management, Taipei 116, Taiwan
关键词
Reinforcement learning; Computer architecture; Planning; Adaptation models; Predictive models; Computational modeling; Approximation algorithms; dyna-Q architecture; tree structure; experience replay;
D O I
10.1109/TETC.2018.2890682
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Q-Learning, which is a well-known model-free reinforcement learning algorithm, a learning agent explores an environment to update a state-action function. In reinforcement learning, the agent does not require information about the environment in advance, so an interaction between the agent and the environment is for collecting the real experiences that is also an expensive and time-consuming process. Therefore, to reduce the burden of the interaction, sample efficiency becomes an important role in reinforcement learning. This study proposes an adaptive tree structure integrating with experience replay for Q-Learning, called ERTS-Q. In ERTS-Q method, Q-Learning is used for policy learning, a tree structure establishes a virtual model which perceives two different continuous states after each state transaction, and then the variations of the continuous state are calculated. After each state transition, all states with highly similar variation are aggregated into the same leaf nodes. Otherwise, new leaf nodes will be produced. For experience replay, the tree structure predicts the next state and reward based on the statistical information that is stored in tree nodes. The virtual experiences produced by the tree structure are used for achieving extra learning. Simulations of the mountain car and a maze environment are performed to verify the validity of the proposed modeling learning approach.
引用
收藏
页码:972 / 982
页数:11
相关论文
共 50 条
  • [1] Deep Reinforcement Learning with Experience Replay Based on SARSA
    Zhao, Dongbin
    Wang, Haitao
    Shao, Kun
    Zhu, Yuanheng
    [J]. PROCEEDINGS OF 2016 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2016,
  • [2] Autonomous reinforcement learning with experience replay
    Wawrzynski, Pawel
    Tanwani, Ajay Kumar
    [J]. NEURAL NETWORKS, 2013, 41 : 156 - 167
  • [3] A New Reinforcement Learning Algorithm Based on Counterfactual Experience Replay
    Li Menglin
    Chen Jing
    Chen Shaofei
    Gao Wei
    [J]. PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 1994 - 2001
  • [4] Trial and Error Experience Replay Based Deep Reinforcement Learning
    Zhang, Cheng
    Ma, Liang
    [J]. 4TH IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD 2019) / 3RD INTERNATIONAL SYMPOSIUM ON REINFORCEMENT LEARNING (ISRL 2019), 2019, : 221 - 226
  • [5] Associative Memory Based Experience Replay for Deep Reinforcement Learning
    Li, Mengyuan
    Kazemi, Arman
    Laguna, Ann Franchesca
    Hu, X. Sharon
    [J]. 2022 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2022,
  • [6] SELECTIVE EXPERIENCE REPLAY IN REINFORCEMENT LEARNING FOR REIDENTIFICATION
    Thakoor, Ninad
    Bhanu, Bir
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 4250 - 4254
  • [7] Efficient experience replay architecture for offline reinforcement learning
    Zhang, Longfei
    Feng, Yanghe
    Wang, Rongxiao
    Xu, Yue
    Xu, Naifu
    Liu, Zeyi
    Du, Hang
    [J]. ROBOTIC INTELLIGENCE AND AUTOMATION, 2023, 43 (01): : 35 - 43
  • [8] Clustering experience replay for the effective exploitation in reinforcement learning
    Li, Min
    Huang, Tianyi
    Zhu, William
    [J]. PATTERN RECOGNITION, 2022, 131
  • [9] Batch process control based on reinforcement learning with segmented prioritized experience replay
    Xu, Chen
    Ma, Junwei
    Tao, Hongfeng
    [J]. MEASUREMENT SCIENCE AND TECHNOLOGY, 2024, 35 (05)
  • [10] A transferable deep reinforcement learning high-speed railway rescheduling method based on prioritized experience replay
    Dai, Xue-Wu
    Wu, Yue
    Shi, Qi
    Cui, Dong-Liang
    Yu, Sheng-Ping
    [J]. Kongzhi yu Juece/Control and Decision, 2023, 38 (08): : 2375 - 2388