Planning with Q-Values in Sparse Reward Reinforcement Learning

被引：0

作者：

Lei, Hejun ^{[1
]}

Weng, Paul ^{[2
]}

Rojas, Juan ^{[3
]}

Guan, Yisheng ^{[1
]}

机构：

[1] Guangdong Univ Technol, BIRL, Guangzhou, Peoples R China

[2] Shanghai Jiao Tong Univ, UM SJTU Joint Inst, Shanghai, Peoples R China

[3] Chinese Univ Hong Kong, Sch Mech & Automat Engn, Hong Kong, Peoples R China

来源：

INTELLIGENT ROBOTICS AND APPLICATIONS (ICIRA 2022), PT I | 2022年 / 13455卷

关键词：

Motion planning; Model-based reinforcement learning; Sparse reward;

D O I：

10.1007/978-3-031-13844-7_56

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning a policy from sparse rewards is a main challenge in reinforcement learning (RL). The best solutions to this challenge have been via sample inefficient model-free RL algorithms. Model-based RL algorithms are known to be sample efficient but few of them can solve sparse settings. To address these limitations, we present PlanQ, a sample efficient model-based RL framework that resolves sparse reward settings. PlanQ leverages Q-values that encode long-term values and serve as a richer feedback signal to actions than immediate rewards. As such, PlanQ scores rollout returns from its learned model with returns containing Q-values. We verify the efficacy of the approach on robot manipulation tasks whose difficulties range from simple to complex. Our experimental results show that PlanQ enhances performance and efficiency in sparse reward settings.

引用

页码：603 / 614

页数：12

共 50 条

[1] Judgmentally adjusted Q-values based on Q-ensemble for offline reinforcement learning
Wenzhuo Liu
Shuying Xiang
Tao Zhang
Yanan Han
Xingxing Guo
Yahui Zhang
Yue Hao
[J]. Neural Computing and Applications, 2024, 36 (25) : 15255 - 15277
[2] Using free energies to represent Q-values in a multiagent reinforcement learning task
Sallans, B
Hinton, GE
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 13, 2001, 13 : 1075 - 1081
[3] Expansive Latent Planning for Sparse Reward Offline Reinforcement Learning
Gieselmann, Robert
Pokorny, Florian T.
[J]. CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
[4] A proposition of addition and integration of q-values in Q-Learning
Aung, Kathy Thi
Fuchida, Takayasu
[J]. PROCEEDINGS OF THE EIGHTEENTH INTERNATIONAL SYMPOSIUM ON ARTIFICIAL LIFE AND ROBOTICS (AROB 18TH '13), 2013, : 304 - 307
[5] A Q-values Sharing Framework for Multi-agent Reinforcement Learning under Budget Constraint
Zhu, Changxi
Leung, Ho-Fung
Hu, Shuyue
Cai, Yi
[J]. ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS, 2021, 15 (02)
[6] Initialization of Q-values by fuzzy rules for accelerating Q-learning
Oh, CH
Nakashima, T
Ishibuchi, H
[J]. IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE, 1998, : 2051 - 2056
[7] Planning-integrated Policy for Efficient Reinforcement Learning in Sparse-reward Environments
Wulur, Christoper
Weber, Cornelius
Wermter, Stefan
[J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[8] Exploiting Transformer in Sparse Reward Reinforcement Learning for Interpretable Temporal Logic Motion Planning
Zhang, Hao
Wang, Hao
Kan, Zhen
[J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (08) : 4831 - 4838
[9] Improving the Performance of Q-learning Using Simultanouse Q-values Updating
Pouyan, Maryam
Mousavi, Amin
Golzari, Shahram
Hatam, Ahmad
[J]. 2014 INTERNATIONAL CONGRESS ON TECHNOLOGY, COMMUNICATION AND KNOWLEDGE (ICTCK), 2014,
[10] THE SYSTEMATICS OF NUCLEAR Q-VALUES
GREEN, AES
MINOGUE, RB
[J]. PHYSICAL REVIEW, 1951, 83 (04): : 876 - 876

← 1 2 3 4 5 →