An Experience Replay Method Based on Tree Structure for Reinforcement Learning

被引：1

作者：

Jiang, Wei-Cheng ^{[1
]}

Hwang, Kao-Shing ^{[1
,2
]}

Lin, Jin-Ling ^{[3
]}

机构：

[1] Natl Sun Yat Sen Univ, Dept Elect Engn, Kaohsiung 80424, Taiwan

[2] Kaohsiung Med Univ, Dept Healthcare Adm & Med Informat, Kaohsiung, Taiwan

[3] Shih Hsin Univ, Dept Informat Management, Taipei 116, Taiwan

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING | 2021年 / 9卷 / 02期

关键词：

Reinforcement learning; Computer architecture; Planning; Adaptation models; Predictive models; Computational modeling; Approximation algorithms; dyna-Q architecture; tree structure; experience replay;

D O I：

10.1109/TETC.2018.2890682

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Q-Learning, which is a well-known model-free reinforcement learning algorithm, a learning agent explores an environment to update a state-action function. In reinforcement learning, the agent does not require information about the environment in advance, so an interaction between the agent and the environment is for collecting the real experiences that is also an expensive and time-consuming process. Therefore, to reduce the burden of the interaction, sample efficiency becomes an important role in reinforcement learning. This study proposes an adaptive tree structure integrating with experience replay for Q-Learning, called ERTS-Q. In ERTS-Q method, Q-Learning is used for policy learning, a tree structure establishes a virtual model which perceives two different continuous states after each state transaction, and then the variations of the continuous state are calculated. After each state transition, all states with highly similar variation are aggregated into the same leaf nodes. Otherwise, new leaf nodes will be produced. For experience replay, the tree structure predicts the next state and reward based on the statistical information that is stored in tree nodes. The virtual experiences produced by the tree structure are used for achieving extra learning. Simulations of the mountain car and a maze environment are performed to verify the validity of the proposed modeling learning approach.

引用

页码：972 / 982

页数：11

共 50 条

[31] Invariant Transform Experience Replay: Data Augmentation for Deep Reinforcement Learning
Lin, Yijiong
Huang, Jiancong
Zimmer, Matthieu
Guan, Yisheng
Rojas, Juan
Weng, Paul
[J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (04): : 6615 - 6622
[32] Memory Reduction through Experience Classification for Deep Reinforcement Learning with Prioritized Experience Replay
Shen, Kai-Huan
Tsai, Pei-Yun
[J]. PROCEEDINGS OF THE 2019 IEEE INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS 2019), 2019, : 166 - 171
[33] Double Broad Reinforcement Learning Based on Hindsight Experience Replay for Collision Avoidance of Unmanned Surface Vehicles
Yu, Jiabao
Chen, Jiawei
Chen, Ying
Zhou, Zhiguo
Duan, Junwei
[J]. JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2022, 10 (12)
[34] Map-based experience replay: a memory-efficient solution to catastrophic forgetting in reinforcement learning
Hafez, Muhammad Burhan
Immisch, Tilman
Weber, Tom
Wermter, Stefan
[J]. FRONTIERS IN NEUROROBOTICS, 2023, 17
[35] A sample efficient model-based deep reinforcement learning algorithm with experience replay for robot manipulation
Zhang, Cheng
Ma, Liang
Schmitz, Alexander
[J]. INTERNATIONAL JOURNAL OF INTELLIGENT ROBOTICS AND APPLICATIONS, 2020, 4 (02) : 217 - 228
[36] A sample efficient model-based deep reinforcement learning algorithm with experience replay for robot manipulation
Cheng Zhang
Liang Ma
Alexander Schmitz
[J]. International Journal of Intelligent Robotics and Applications, 2020, 4 : 217 - 228
[37] Unveiling the Effects of Experience Replay on Deep Reinforcement Learning-based Power Allocation in Wireless Networks
Kopic, Amna
Perenda, Erma
Gacanin, Haris
[J]. 2024 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE, WCNC 2024, 2024,
[38] Multi-Input Autonomous Driving Based on Deep Reinforcement Learning With Double Bias Experience Replay
Cui, Jianping
Yuan, Liang
He, Li
Xiao, Wendong
Ran, Teng
Zhang, Jianbo
[J]. IEEE SENSORS JOURNAL, 2023, 23 (11) : 11253 - 11261
[39] Exploring a Reinforcement Learning Agent with Improved Prioritized Experience Replay for a Confrontation Game
Zhao, Tian
[J]. 2022 INTERNATIONAL CONFERENCE ON BIG DATA, INFORMATION AND COMPUTER NETWORK (BDICN 2022), 2022, : 373 - 381
[40] Research on Experience Replay of Off-policy Deep Reinforcement Learning: A Review
Hu, Zi-Jian
Gao, Xiao-Guang
Wan, Kai-Fang
Zhang, Le-Tian
Wang, Qiang-Long
Neretin, Evgeny
[J]. Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (11): : 2237 - 2256

← 1 2 3 4 5 →