An Experience Replay Method Based on Tree Structure for Reinforcement Learning

被引：1

作者：

Jiang, Wei-Cheng ^{[1
]}

Hwang, Kao-Shing ^{[1
,2
]}

Lin, Jin-Ling ^{[3
]}

机构：

[1] Natl Sun Yat Sen Univ, Dept Elect Engn, Kaohsiung 80424, Taiwan

[2] Kaohsiung Med Univ, Dept Healthcare Adm & Med Informat, Kaohsiung, Taiwan

[3] Shih Hsin Univ, Dept Informat Management, Taipei 116, Taiwan

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING | 2021年 / 9卷 / 02期

关键词：

Reinforcement learning; Computer architecture; Planning; Adaptation models; Predictive models; Computational modeling; Approximation algorithms; dyna-Q architecture; tree structure; experience replay;

D O I：

10.1109/TETC.2018.2890682

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Q-Learning, which is a well-known model-free reinforcement learning algorithm, a learning agent explores an environment to update a state-action function. In reinforcement learning, the agent does not require information about the environment in advance, so an interaction between the agent and the environment is for collecting the real experiences that is also an expensive and time-consuming process. Therefore, to reduce the burden of the interaction, sample efficiency becomes an important role in reinforcement learning. This study proposes an adaptive tree structure integrating with experience replay for Q-Learning, called ERTS-Q. In ERTS-Q method, Q-Learning is used for policy learning, a tree structure establishes a virtual model which perceives two different continuous states after each state transaction, and then the variations of the continuous state are calculated. After each state transition, all states with highly similar variation are aggregated into the same leaf nodes. Otherwise, new leaf nodes will be produced. For experience replay, the tree structure predicts the next state and reward based on the statistical information that is stored in tree nodes. The virtual experiences produced by the tree structure are used for achieving extra learning. Simulations of the mountain car and a maze environment are performed to verify the validity of the proposed modeling learning approach.

引用

页码：972 / 982

页数：11

共 50 条

[41] Reinforcement learning path planning method incorporating multi-step Hindsight Experience Replay for lightweight robots
Wang, Jiaqi
Han, Huiyan
Han, Xie
Kuang, Liqun
Yang, Xiaowen
[J]. DISPLAYS, 2024, 84
[42] Tractable Reinforcement Learning for Signal Temporal Logic Tasks With Counterfactual Experience Replay
Wang, Siqi
Yin, Xunyuan
Li, Shaoyuan
Yin, Xiang
[J]. IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 616 - 621
[43] Re-attentive experience replay in off-policy reinforcement learning
Wei Wei
Da Wang
Lin Li
Jiye Liang
[J]. Machine Learning, 2024, 113 : 2327 - 2349
[44] Re-attentive experience replay in off-policy reinforcement learning
Wei, Wei
Wang, Da
Li, Lin
Liang, Jiye
[J]. MACHINE LEARNING, 2024, 113 (05) : 2327 - 2349
[45] Reinforcement Learning with Experience Replay for Model-Free Humanoid Walking Optimization
Wawrzynski, Pawel
[J]. INTERNATIONAL JOURNAL OF HUMANOID ROBOTICS, 2014, 11 (03)
[46] The Effects of Memory Replay in Reinforcement Learning
Liu, Ruishan
Zou, James
[J]. 2018 56TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2018, : 478 - 485
[47] Structure Aware Experience Replay for Incremental Learning in Graph-based Recommender Systems
Ahrabian, Kian
Xu, Yishi
Zhang, Yingxue
Wu, Jiapeng
Wang, Yuening
Coates, Mark
[J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 2832 - 2836
[48] HCS-R-HER: Hierarchical reinforcement learning based on cross subtasks rainbow hindsight experience replay
Zhao, Xiaotong
Du, Jingli
Wang, Zhihan
[J]. JOURNAL OF COMPUTATIONAL SCIENCE, 2023, 72
[49] Experience Replay for Continual Learning
Rolnick, David
Ahuja, Arun
Schwarz, Jonathan
Lillicrap, Timothy P.
Wayne, Greg
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[50] A Dual Memory Structure for Efficient Use of Replay Memory in Deep Reinforcement Learning
Ko, Wonshick
Chang, Dong Eui
[J]. 2019 19TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2019), 2019, : 1483 - 1486

← 1 2 3 4 5 →