Clustering experience replay for the effective exploitation in reinforcement learning

被引：13

作者：

Li, Min ^{[1
]}

Huang, Tianyi ^{[1
]}

Zhu, William ^{[1
]}

机构：

[1] Univ Elect Sci & Technol China, Inst Fundamental & Frontier Sci, Chengdu 610054, Peoples R China

来源：

PATTERN RECOGNITION | 2022年 / 131卷

关键词：

Reinforcement learning; Clustering; Experience replay; Exploitation efficiency; Time division;

D O I：

10.1016/j.patcog.2022.108875

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement learning is a useful tool for training an agent to effectively achieve the desired goal in the sequential decision-making problem. It trains the agent to make decision by exploiting the experience in the transitions resulting from the different decisions. To exploit this experience, most reinforcement learning methods replay the explored transitions by uniform sampling. But in this way, it is easy to ig-nore the last explored transitions. Another way to exploit this experience defines the priority of each transition by the estimation error in training and then replays the transitions according to their priori-ties. But it only updates the priorities of the transitions replayed at the current training time step, thus the transitions with low priorities will be ignored. In this paper, we propose a clustering experience re-play, called CER, to effectively exploit the experience hidden in all explored transitions in the current training. CER clusters and replays the transitions by a divide-and-conquer framework based on time di-vision as follows. Firstly, it divides the whole training process into several periods. Secondly, at the end of each period, it uses k-means to cluster the transitions explored in this period. Finally, it constructs a conditional probability density function to ensure that all kinds of transitions will be sufficiently replayed in the current training. We construct a new method, TD3 _ CER, to implement our clustering experience replay on TD3. Through the theoretical analysis and experiments, we illustrate that our TD3 _ CER is more effective than the existing reinforcement learning methods. The source code can be downloaded from https://github.com/grcai/CER-Master .(c) 2022 Elsevier Ltd. All rights reserved.

引用

页数：9

共 50 条

[1] Autonomous reinforcement learning with experience replay
Wawrzynski, Pawel
Tanwani, Ajay Kumar
[J]. NEURAL NETWORKS, 2013, 41 : 156 - 167
[2] SELECTIVE EXPERIENCE REPLAY IN REINFORCEMENT LEARNING FOR REIDENTIFICATION
Thakoor, Ninad
Bhanu, Bir
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 4250 - 4254
[3] Efficient experience replay architecture for offline reinforcement learning
Zhang, Longfei
Feng, Yanghe
Wang, Rongxiao
Xu, Yue
Xu, Naifu
Liu, Zeyi
Du, Hang
[J]. ROBOTIC INTELLIGENCE AND AUTOMATION, 2023, 43 (01): : 35 - 43
[4] Deep Reinforcement Learning with Experience Replay Based on SARSA
Zhao, Dongbin
Wang, Haitao
Shao, Kun
Zhu, Yuanheng
[J]. PROCEEDINGS OF 2016 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2016,
[5] Experience Replay for Real-Time Reinforcement Learning Control
Adam, Sander
Busoniu, Lucian
Babuska, Robert
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (02): : 201 - 212
[6] A New Reinforcement Learning Algorithm Based on Counterfactual Experience Replay
Li Menglin
Chen Jing
Chen Shaofei
Gao Wei
[J]. PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 1994 - 2001
[7] Autonomous Reinforcement Learning with Experience Replay for Humanoid Gait Optimization
Wawrzynski, Pawel
[J]. PROCEEDINGS OF THE INTERNATIONAL NEURAL NETWORK SOCIETY WINTER CONFERENCE (INNS-WC2012), 2012, 13 : 205 - 211
[8] An Experience Replay Method Based on Tree Structure for Reinforcement Learning
Jiang, Wei-Cheng
Hwang, Kao-Shing
Lin, Jin-Ling
[J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2021, 9 (02) : 972 - 982
[9] Deep Reinforcement Learning With Quantum-Inspired Experience Replay
Wei, Qing
Ma, Hailan
Chen, Chunlin
Dong, Daoyi
[J]. IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (09) : 9326 - 9338
[10] Associative Memory Based Experience Replay for Deep Reinforcement Learning
Li, Mengyuan
Kazemi, Arman
Laguna, Ann Franchesca
Hu, X. Sharon
[J]. 2022 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2022,

← 1 2 3 4 5 →