Reward-weighted DHER Mechanism For Multi-goal Reinforcement Learning With Application To Robotic Manipulation Control

被引:0
|
作者
Wei, Xueyu [1 ]
Duan, Lilong [1 ]
Xue, Wei [1 ]
机构
[1] Anhui Univ Technol, Sch Comp Sci & Technol, Maanshan 243032, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Reinforcement learning; Multi-goal learning; Hindsight experience replay; Hindsight bias; Reward-weighted; SHOGI; CHESS; LEVEL;
D O I
10.6180/jase.202312_26(12).0015
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
In multi-goal reinforcement learning, an agent learns to achieve multiple goals using a goal-oriented policy, obtaining rewards from positions that have been achieved. Dynamic hindsight experience replay method improves the learning efficiency of the algorithm by matching the trajectories of past failed episodes and creating successful experiences. But these experiences are sampled and replayed by a random strategy, without considering the importance of the episode samples for learning. Therefore, not only bias is introduced as the training process, but also suboptimal improvements in terms of sample efficiency are obtained. To address these issues, this paper introduces a reward-weighted mechanism based on the dynamic hindsight experience replay (RDHER). We extend dynamic hindsight experience replay with a trade-off to make rewards calculated for hindsight experience numerically greater than actual rewards. Specifically, the hindsight rewards are multiplied by a weighting factor to increase the Q-value of the hindsight state-action pair, which drives the update of the policy to select the maximum action for the given hindsight transitions. Our experiments show that the hindsight bias can be reduced in training using the proposed method. Further, we demonstrate RDHER is effective in challenging robot manipulation tasks, and outperforms several other multi-goal baseline methods in terms of success rate.
引用
收藏
页码:1829 / 1841
页数:13
相关论文
共 50 条
  • [1] Using reward-weighted regression for reinforcement learning of task space control
    Peters, Jan
    Schaal, Stefan
    [J]. 2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, : 262 - +
  • [2] Using Reward-Weighted Imitation for Robot Reinforcement Learning
    Peters, Jan
    Kober, Jens
    [J]. ADPRL: 2009 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2009, : 226 - 232
  • [3] Episodic Reinforcement Learning by Logistic Reward-Weighted Regression
    Wierstra, Daan
    Schaul, Tom
    Peters, Jan
    Schmidhuber, Juergen
    [J]. ARTIFICIAL NEURAL NETWORKS - ICANN 2008, PT I, 2008, 5163 : 407 - +
  • [4] Biologically inspired reinforcement learning: Reward-based decomposition for multi-goal environments
    Zhou, WD
    Coggins, R
    [J]. BIOLOGICALLY INSPIRED APPROACHES TO ADVANCED INFORMATION TECHNOLOGY, 2004, 3141 : 80 - 94
  • [5] Guided goal generation for hindsight multi-goal reinforcement learning
    Bai, Chenjia
    Liu, Peng
    Zhao, Wei
    Tang, Xianglong
    [J]. NEUROCOMPUTING, 2019, 359 : 353 - 367
  • [6] Goal Density-based Hindsight Experience Prioritization for Multi-Goal Robot Manipulation Reinforcement Learning
    Kuang, Yingyi
    Weinberg, Abraham Itzhak
    Vogiatzis, George
    Faria, Diego R.
    [J]. 2020 29TH IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (RO-MAN), 2020, : 432 - 437
  • [7] Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning
    Hachiya, Hirotaka
    Peters, Jan
    Sugiyama, Masashi
    [J]. NEURAL COMPUTATION, 2011, 23 (11) : 2798 - 2832
  • [8] Maximum Entropy-Regularized Multi-Goal Reinforcement Learning
    Zhao, Rui
    Sun, Xudong
    Tresp, Volker
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [9] Combining Hindsight with Goal-enhanced Prediction for Multi-goal Reinforcement Learning
    Yang, Rui
    Luo, Feng
    Li, Xiu
    [J]. 2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 314 - 321
  • [10] Stein Variational Goal Generation for adaptive Exploration in Multi-Goal Reinforcement Learning
    Castanet, Nicolas
    Sigaud, Olivier
    Lamprier, Sylvain
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202