An Experience Replay Method Based on Tree Structure for Reinforcement Learning

被引：1

作者：

Jiang, Wei-Cheng ^{[1
]}

Hwang, Kao-Shing ^{[1
,2
]}

Lin, Jin-Ling ^{[3
]}

机构：

[1] Natl Sun Yat Sen Univ, Dept Elect Engn, Kaohsiung 80424, Taiwan

[2] Kaohsiung Med Univ, Dept Healthcare Adm & Med Informat, Kaohsiung, Taiwan

[3] Shih Hsin Univ, Dept Informat Management, Taipei 116, Taiwan

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING | 2021年 / 9卷 / 02期

关键词：

Reinforcement learning; Computer architecture; Planning; Adaptation models; Predictive models; Computational modeling; Approximation algorithms; dyna-Q architecture; tree structure; experience replay;

D O I：

10.1109/TETC.2018.2890682

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Q-Learning, which is a well-known model-free reinforcement learning algorithm, a learning agent explores an environment to update a state-action function. In reinforcement learning, the agent does not require information about the environment in advance, so an interaction between the agent and the environment is for collecting the real experiences that is also an expensive and time-consuming process. Therefore, to reduce the burden of the interaction, sample efficiency becomes an important role in reinforcement learning. This study proposes an adaptive tree structure integrating with experience replay for Q-Learning, called ERTS-Q. In ERTS-Q method, Q-Learning is used for policy learning, a tree structure establishes a virtual model which perceives two different continuous states after each state transaction, and then the variations of the continuous state are calculated. After each state transition, all states with highly similar variation are aggregated into the same leaf nodes. Otherwise, new leaf nodes will be produced. For experience replay, the tree structure predicts the next state and reward based on the statistical information that is stored in tree nodes. The virtual experiences produced by the tree structure are used for achieving extra learning. Simulations of the mountain car and a maze environment are performed to verify the validity of the proposed modeling learning approach.

引用

页码：972 / 982

页数：11

共 50 条

[1] Deep Reinforcement Learning with Experience Replay Based on SARSA
Zhao, Dongbin
Wang, Haitao
Shao, Kun
Zhu, Yuanheng
[J]. PROCEEDINGS OF 2016 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2016,
[2] Autonomous reinforcement learning with experience replay
Wawrzynski, Pawel
Tanwani, Ajay Kumar
[J]. NEURAL NETWORKS, 2013, 41 : 156 - 167
[3] A New Reinforcement Learning Algorithm Based on Counterfactual Experience Replay
Li Menglin
Chen Jing
Chen Shaofei
Gao Wei
[J]. PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 1994 - 2001
[4] Trial and Error Experience Replay Based Deep Reinforcement Learning
Zhang, Cheng
Ma, Liang
[J]. 4TH IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD 2019) / 3RD INTERNATIONAL SYMPOSIUM ON REINFORCEMENT LEARNING (ISRL 2019), 2019, : 221 - 226
[5] Associative Memory Based Experience Replay for Deep Reinforcement Learning
Li, Mengyuan
Kazemi, Arman
Laguna, Ann Franchesca
Hu, X. Sharon
[J]. 2022 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2022,
[6] SELECTIVE EXPERIENCE REPLAY IN REINFORCEMENT LEARNING FOR REIDENTIFICATION
Thakoor, Ninad
Bhanu, Bir
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 4250 - 4254
[7] Efficient experience replay architecture for offline reinforcement learning
Zhang, Longfei
Feng, Yanghe
Wang, Rongxiao
Xu, Yue
Xu, Naifu
Liu, Zeyi
Du, Hang
[J]. ROBOTIC INTELLIGENCE AND AUTOMATION, 2023, 43 (01): : 35 - 43
[8] Clustering experience replay for the effective exploitation in reinforcement learning
Li, Min
Huang, Tianyi
Zhu, William
[J]. PATTERN RECOGNITION, 2022, 131
[9] Batch process control based on reinforcement learning with segmented prioritized experience replay
Xu, Chen
Ma, Junwei
Tao, Hongfeng
[J]. MEASUREMENT SCIENCE AND TECHNOLOGY, 2024, 35 (05)
[10] A transferable deep reinforcement learning high-speed railway rescheduling method based on prioritized experience replay
Dai, Xue-Wu
Wu, Yue
Shi, Qi
Cui, Dong-Liang
Yu, Sheng-Ping
[J]. Kongzhi yu Juece/Control and Decision, 2023, 38 (08): : 2375 - 2388

← 1 2 3 4 5 →