Efficient Multi-Goal Reinforcement Learning via Value Consistency Prioritization

被引:0
|
作者
Xu, Jiawei [1 ]
Li, Shuxing [1 ]
Yang, Rui [2 ]
Yuan, Chun [1 ]
Han, Lei [3 ]
机构
[1] Tsinghua Shenzhen Int Grad Sch, Shenzhen, Guangdong, Peoples R China
[2] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
[3] Tencent Robot X, Shenzhen, Guangdong, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Goal-conditioned reinforcement learning (RL) with sparse rewards remains a challeng-ing problem in deep RL. Hindsight Experience Replay (HER) has been demonstrated to be an effective solution, where HER replaces desired goals in failed experiences with practically achieved states. Existing approaches mainly focus on either exploration or exploitation to improve the performance of HER. From a joint perspective, exploiting specific past ex-periences can also implicitly drive exploration. Therefore, we concentrate on prioritizing both original and relabeled samples for efficient goal-conditioned RL. To achieve this, we propose a novel value consistency prioritization (VCP) method, where the priority of sam-ples is determined by the consistency of ensemble Q-values. This distinguishes the VCP method with most existing prioritization approaches which prioritizes samples based on the uncertainty of ensemble Q-values. Through extensive experiments, we demonstrate that VCP achieves significantly higher sample efficiency than existing algorithms on a range of challenging goal-conditioned manipulation tasks. We also visualize how VCP prioritizes good experiences to enhance policy learning.
引用
收藏
页码:355 / 376
页数:22
相关论文
共 50 条
  • [41] Fine-grained HTTP/3 prioritization via reinforcement learning
    Wong, KaKei
    Cui, Lin
    COMPUTER NETWORKS, 2023, 233
  • [42] An efficient global/multi-local stress analysis of complicated engineering composite structures using multi-goal MapReduce
    You, Tao
    Xu, Yingjie
    COMPUTATIONAL MATERIALS SCIENCE, 2012, 65 : 149 - 156
  • [43] Supervised Reinforcement Learning via Value Function
    Pan, Yaozong
    Zhang, Jian
    Yuan, Chunhui
    Yang, Haitao
    SYMMETRY-BASEL, 2019, 11 (04):
  • [44] UAV Swarm Rounding Strategy Based on Deep Reinforcement Learning Goal Consistency with Multi-Head Soft Attention Algorithm
    Wei, Zhaotian
    Wei, Ruixuan
    DRONES, 2024, 8 (12)
  • [45] THE IMPACT OF STIMULUS VALUE ON GOAL-DIRECTED AVERSIVE REINFORCEMENT LEARNING
    Lindstrom, Bjorn
    Golkar, Armita
    Olsson, Andreas
    JOURNAL OF COGNITIVE NEUROSCIENCE, 2013, : 155 - 155
  • [46] Robust multi-agent reinforcement learning via Bayesian distributional value estimation
    Du, Xinqi
    Chen, Hechang
    Wang, Che
    Xing, Yongheng
    Yang, Jielong
    Yu, Philip S.
    Chang, Yi
    He, Lifang
    PATTERN RECOGNITION, 2024, 145
  • [47] Efficient Halftoning via Deep Reinforcement Learning
    Jiang, Haitian
    Xiong, Dongliang
    Jiang, Xiaowen
    Ding, Li
    Chen, Liang
    Huang, Kai
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 5494 - 5508
  • [48] EFFICIENT INDOOR LOCALIZATION VIA REINFORCEMENT LEARNING
    Milioris, Dimitris
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 8350 - 8354
  • [49] Efficient Communication in Multi-Agent Reinforcement Learning via Variance Based Control
    Zhang, Sai Qian
    Zhang, Qi
    Lin, Jieyu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [50] Efficient Exploration for Multi-Agent Reinforcement Learning via Transferable Successor Features
    Wenzhang Liu
    Lu Dong
    Dan Niu
    Changyin Sun
    IEEE/CAA Journal of Automatica Sinica, 2022, 9 (09) : 1673 - 1686