Reward-weighted DHER Mechanism For Multi-goal Reinforcement Learning With Application To Robotic Manipulation Control

被引：0

作者：

Wei, Xueyu ^{[1
]}

Duan, Lilong ^{[1
]}

Xue, Wei ^{[1
]}

机构：

[1] Anhui Univ Technol, Sch Comp Sci & Technol, Maanshan 243032, Peoples R China

来源：

JOURNAL OF APPLIED SCIENCE AND ENGINEERING | 2023年 / 26卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Reinforcement learning; Multi-goal learning; Hindsight experience replay; Hindsight bias; Reward-weighted; SHOGI; CHESS; LEVEL;

D O I：

10.6180/jase.202312_26(12).0015

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

In multi-goal reinforcement learning, an agent learns to achieve multiple goals using a goal-oriented policy, obtaining rewards from positions that have been achieved. Dynamic hindsight experience replay method improves the learning efficiency of the algorithm by matching the trajectories of past failed episodes and creating successful experiences. But these experiences are sampled and replayed by a random strategy, without considering the importance of the episode samples for learning. Therefore, not only bias is introduced as the training process, but also suboptimal improvements in terms of sample efficiency are obtained. To address these issues, this paper introduces a reward-weighted mechanism based on the dynamic hindsight experience replay (RDHER). We extend dynamic hindsight experience replay with a trade-off to make rewards calculated for hindsight experience numerically greater than actual rewards. Specifically, the hindsight rewards are multiplied by a weighting factor to increase the Q-value of the hindsight state-action pair, which drives the update of the policy to select the maximum action for the given hindsight transitions. Our experiments show that the hindsight bias can be reduced in training using the proposed method. Further, we demonstrate RDHER is effective in challenging robot manipulation tasks, and outperforms several other multi-goal baseline methods in terms of success rate.

引用

页码：1829 / 1841

页数：13

共 50 条

[1] Using reward-weighted regression for reinforcement learning of task space control
Peters, Jan
Schaal, Stefan
[J]. 2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, : 262 - +
[2] Using Reward-Weighted Imitation for Robot Reinforcement Learning
Peters, Jan
Kober, Jens
[J]. ADPRL: 2009 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2009, : 226 - 232
[3] Episodic Reinforcement Learning by Logistic Reward-Weighted Regression
Wierstra, Daan
Schaul, Tom
Peters, Jan
Schmidhuber, Juergen
[J]. ARTIFICIAL NEURAL NETWORKS - ICANN 2008, PT I, 2008, 5163 : 407 - +
[4] Biologically inspired reinforcement learning: Reward-based decomposition for multi-goal environments
Zhou, WD
Coggins, R
[J]. BIOLOGICALLY INSPIRED APPROACHES TO ADVANCED INFORMATION TECHNOLOGY, 2004, 3141 : 80 - 94
[5] Guided goal generation for hindsight multi-goal reinforcement learning
Bai, Chenjia
Liu, Peng
Zhao, Wei
Tang, Xianglong
[J]. NEUROCOMPUTING, 2019, 359 : 353 - 367
[6] Goal Density-based Hindsight Experience Prioritization for Multi-Goal Robot Manipulation Reinforcement Learning
Kuang, Yingyi
Weinberg, Abraham Itzhak
Vogiatzis, George
Faria, Diego R.
[J]. 2020 29TH IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (RO-MAN), 2020, : 432 - 437
[7] Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning
Hachiya, Hirotaka
Peters, Jan
Sugiyama, Masashi
[J]. NEURAL COMPUTATION, 2011, 23 (11) : 2798 - 2832
[8] Maximum Entropy-Regularized Multi-Goal Reinforcement Learning
Zhao, Rui
Sun, Xudong
Tresp, Volker
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[9] Combining Hindsight with Goal-enhanced Prediction for Multi-goal Reinforcement Learning
Yang, Rui
Luo, Feng
Li, Xiu
[J]. 2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 314 - 321
[10] Stein Variational Goal Generation for adaptive Exploration in Multi-Goal Reinforcement Learning
Castanet, Nicolas
Sigaud, Olivier
Lamprier, Sylvain
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202

← 1 2 3 4 5 →