Planning with Q-Values in Sparse Reward Reinforcement Learning

被引:0
|
作者
Lei, Hejun [1 ]
Weng, Paul [2 ]
Rojas, Juan [3 ]
Guan, Yisheng [1 ]
机构
[1] Guangdong Univ Technol, BIRL, Guangzhou, Peoples R China
[2] Shanghai Jiao Tong Univ, UM SJTU Joint Inst, Shanghai, Peoples R China
[3] Chinese Univ Hong Kong, Sch Mech & Automat Engn, Hong Kong, Peoples R China
关键词
Motion planning; Model-based reinforcement learning; Sparse reward;
D O I
10.1007/978-3-031-13844-7_56
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning a policy from sparse rewards is a main challenge in reinforcement learning (RL). The best solutions to this challenge have been via sample inefficient model-free RL algorithms. Model-based RL algorithms are known to be sample efficient but few of them can solve sparse settings. To address these limitations, we present PlanQ, a sample efficient model-based RL framework that resolves sparse reward settings. PlanQ leverages Q-values that encode long-term values and serve as a richer feedback signal to actions than immediate rewards. As such, PlanQ scores rollout returns from its learned model with returns containing Q-values. We verify the efficacy of the approach on robot manipulation tasks whose difficulties range from simple to complex. Our experimental results show that PlanQ enhances performance and efficiency in sparse reward settings.
引用
收藏
页码:603 / 614
页数:12
相关论文
共 50 条
  • [1] Judgmentally adjusted Q-values based on Q-ensemble for offline reinforcement learning
    Wenzhuo Liu
    Shuying Xiang
    Tao Zhang
    Yanan Han
    Xingxing Guo
    Yahui Zhang
    Yue Hao
    [J]. Neural Computing and Applications, 2024, 36 (25) : 15255 - 15277
  • [2] Using free energies to represent Q-values in a multiagent reinforcement learning task
    Sallans, B
    Hinton, GE
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 13, 2001, 13 : 1075 - 1081
  • [3] Expansive Latent Planning for Sparse Reward Offline Reinforcement Learning
    Gieselmann, Robert
    Pokorny, Florian T.
    [J]. CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [4] A proposition of addition and integration of q-values in Q-Learning
    Aung, Kathy Thi
    Fuchida, Takayasu
    [J]. PROCEEDINGS OF THE EIGHTEENTH INTERNATIONAL SYMPOSIUM ON ARTIFICIAL LIFE AND ROBOTICS (AROB 18TH '13), 2013, : 304 - 307
  • [5] A Q-values Sharing Framework for Multi-agent Reinforcement Learning under Budget Constraint
    Zhu, Changxi
    Leung, Ho-Fung
    Hu, Shuyue
    Cai, Yi
    [J]. ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS, 2021, 15 (02)
  • [6] Initialization of Q-values by fuzzy rules for accelerating Q-learning
    Oh, CH
    Nakashima, T
    Ishibuchi, H
    [J]. IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE, 1998, : 2051 - 2056
  • [7] Planning-integrated Policy for Efficient Reinforcement Learning in Sparse-reward Environments
    Wulur, Christoper
    Weber, Cornelius
    Wermter, Stefan
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [8] Exploiting Transformer in Sparse Reward Reinforcement Learning for Interpretable Temporal Logic Motion Planning
    Zhang, Hao
    Wang, Hao
    Kan, Zhen
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (08) : 4831 - 4838
  • [9] Improving the Performance of Q-learning Using Simultanouse Q-values Updating
    Pouyan, Maryam
    Mousavi, Amin
    Golzari, Shahram
    Hatam, Ahmad
    [J]. 2014 INTERNATIONAL CONGRESS ON TECHNOLOGY, COMMUNICATION AND KNOWLEDGE (ICTCK), 2014,
  • [10] THE SYSTEMATICS OF NUCLEAR Q-VALUES
    GREEN, AES
    MINOGUE, RB
    [J]. PHYSICAL REVIEW, 1951, 83 (04): : 876 - 876