Efficient Multi-Goal Reinforcement Learning via Value Consistency Prioritization

被引:0
|
作者
Xu, Jiawei [1 ]
Li, Shuxing [1 ]
Yang, Rui [2 ]
Yuan, Chun [1 ]
Han, Lei [3 ]
机构
[1] Tsinghua Shenzhen Int Grad Sch, Shenzhen, Guangdong, Peoples R China
[2] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
[3] Tencent Robot X, Shenzhen, Guangdong, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Goal-conditioned reinforcement learning (RL) with sparse rewards remains a challeng-ing problem in deep RL. Hindsight Experience Replay (HER) has been demonstrated to be an effective solution, where HER replaces desired goals in failed experiences with practically achieved states. Existing approaches mainly focus on either exploration or exploitation to improve the performance of HER. From a joint perspective, exploiting specific past ex-periences can also implicitly drive exploration. Therefore, we concentrate on prioritizing both original and relabeled samples for efficient goal-conditioned RL. To achieve this, we propose a novel value consistency prioritization (VCP) method, where the priority of sam-ples is determined by the consistency of ensemble Q-values. This distinguishes the VCP method with most existing prioritization approaches which prioritizes samples based on the uncertainty of ensemble Q-values. Through extensive experiments, we demonstrate that VCP achieves significantly higher sample efficiency than existing algorithms on a range of challenging goal-conditioned manipulation tasks. We also visualize how VCP prioritizes good experiences to enhance policy learning.
引用
收藏
页码:355 / 376
页数:22
相关论文
共 50 条
  • [31] Multi-goal multi-agent learning for task-oriented dialogue with bidirectional teacher-student learning
    He, Wanwei
    Sun, Yang
    Yang, Min
    Ji, Feng
    Li, Chengming
    Xu, Ruifeng
    KNOWLEDGE-BASED SYSTEMS, 2021, 213
  • [32] Value Enhancement of Reinforcement Learning via Efficient and Robust Trust Region Optimization
    Shi, Chengchun
    Qi, Zhengling
    Wang, Jianing
    Zhou, Fan
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (547) : 2011 - 2025
  • [33] Exploring the value of ELT as a secondary school subject in China A multi-goal model for the English curriculum
    Gong, Yafu
    Holliday, Adrian
    SECONDARY SCHOOL ENGLISH EDUCATION IN ASIA: FROM POLICY TO PRACTICE, 2015, : 201 - 217
  • [34] Compact Goal Representation Learning via Information Bottleneck in Goal-Conditioned Reinforcement Learning
    Zou, Qiming
    Suzuki, Einoshin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (02) : 2368 - 2381
  • [35] Efficient Learning for AlphaZero via Path Consistency
    Zhao, Dengwei
    Tu, Shikui
    Xu, Lei
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [36] Data-Efficient Deep Reinforcement Learning with Symmetric Consistency
    Zhang, Xianchao
    Yang, Wentao
    Zhang, Xiaotong
    Liu, Han
    Wang, Guanglu
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2430 - 2436
  • [37] CPA/Tiger-MGP: test-goal set partitioning for efficient multi-goal test-suite generation
    Sebastian Ruland
    Malte Lochau
    Oliver Fehse
    Andy Schürr
    International Journal on Software Tools for Technology Transfer, 2021, 23 : 853 - 856
  • [38] CPA/Tiger-MGP: test-goal set partitioning for efficient multi-goal test-suite generation
    Ruland, Sebastian
    Lochau, Malte
    Fehse, Oliver
    Schuerr, Andy
    INTERNATIONAL JOURNAL ON SOFTWARE TOOLS FOR TECHNOLOGY TRANSFER, 2021, 23 (06) : 853 - 856
  • [39] On Unsupervised Learning based Multi-Goal Path Planning for Visiting 3D Regions
    Faigl, Jan
    Deckerova, Jindriska
    ICRAI 2018: PROCEEDINGS OF 2018 4TH INTERNATIONAL CONFERENCE ON ROBOTICS AND ARTIFICIAL INTELLIGENCE -, 2018, : 45 - 50
  • [40] FedQMIX: Communication-efficient federated learning via multi-agent reinforcement learning
    Cao, Shaohua
    Zhang, Hanqing
    Wen, Tian
    Zhao, Hongwei
    Zheng, Quancheng
    Zhang, Weishan
    Zheng, Danyang
    HIGH-CONFIDENCE COMPUTING, 2024, 4 (02):