Kernel Dynamic Policy Programming: Practical Reinforcement Learning for High-dimensional Robots

被引：0

作者：

Cui, Yunduan

Matsubara, Takamitsu

Sugimoto, Kenji

机构：

来源：

2016 IEEE-RAS 16TH INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS (HUMANOIDS) | 2016年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Applying value function based reinforcement learning algorithms to real robots has been infeasible because the approximation of high-dimensional value function is difficult. The difficulty of such high-dimensional value function approximation in previous methods are twofold: 1) instability of value function approximation by non-smooth policy update and 2) computational complexity associated with high-dimensional state-action space. To cope with these issues, in this paper, we propose Kernel Dynamic Policy Programming (KDPP) that smoothly updates value function in an implicit high-dimensional feature space. The smooth policy update is promoted by adding the Kullback-Leibler divergence between current and updated policies in reward function as a regularization term to stabilize the value function approximation. The computational complexity is reduced by applying the kernel trick in the value function approximation. Therefore, KDPP can be interpreted as a novel yet practical extension of Dynamic Policy Programming (DPP) and kernelized value function-based reinforcement learning methods to combine the strengths of them. We successfully applied KDPP to learn unscrewing bottle cap in a Pneumatic Artificial Muscles (PAMs) driven humanoid robot hand, a system with 24 dimensional state space, with limited number of samples and commonplace computational resource.

引用

页码：662 / 667

页数：6

共 50 条

[1] Kernel dynamic policy programming: Applicable reinforcement learning to robot systems with high dimensional states
Cui, Yunduan
Matsubara, Takamitsu
Sugimoto, Kenji
NEURAL NETWORKS, 2017, 94 : 13 - 23
[2] High-dimensional Function Optimisation by Reinforcement Learning
Wu, Q. H.
Liao, H. L.
2010 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2010,
[3] Inverse Reinforcement Learning Using Dynamic Policy Programming
Uchibe, Eiji
Doya, Kenji
FOUTH JOINT IEEE INTERNATIONAL CONFERENCES ON DEVELOPMENT AND LEARNING AND EPIGENETIC ROBOTICS (IEEE ICDL-EPIROB 2014), 2014, : 222 - 228
[4] Safe Reinforcement Learning of Dynamic High-Dimensional Robotic Tasks: Navigation, Manipulation, Interaction
Liu, Puze
Zhang, Kuo
Tateo, Davide
Jauhri, Snehal
Hu, Zhiyuan
Peters, Jan
Chalvatzaki, Georgia
2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 9449 - 9456
[5] Reinforcement learning for high-dimensional problems with symmetrical actions
Kamal, MAS
Murata, J
2004 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOLS 1-7, 2004, : 6192 - 6197
[6] Offline reinforcement learning in high-dimensional stochastic environments
Félicien Hêche
Oussama Barakat
Thibaut Desmettre
Tania Marx
Stephan Robert-Nicoud
Neural Computing and Applications, 2024, 36 : 585 - 598
[7] Offline reinforcement learning in high-dimensional stochastic environments
Heche, Felicien
Barakat, Oussama
Desmettre, Thibaut
Marx, Tania
Robert-Nicoud, Stephan
NEURAL COMPUTING & APPLICATIONS, 2023, 36 (2): : 585 - 598
[8] Challenges in High-Dimensional Reinforcement Learning with Evolution Strategies
Mueller, Nils
Glasmachers, Tobias
PARALLEL PROBLEM SOLVING FROM NATURE - PPSN XV, PT II, 2018, 11102 : 411 - 423
[9] Emergent Solutions to High-Dimensional Multitask Reinforcement Learning
Kelly, Stephen
Heywood, Malcolm, I
EVOLUTIONARY COMPUTATION, 2018, 26 (03) : 347 - 380
[10] High-Dimensional Stock Portfolio Trading with Deep Reinforcement Learning
Pigorsch, Uta
Schaefer, Sebastian
2022 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE FOR FINANCIAL ENGINEERING AND ECONOMICS (CIFER), 2022,

← 1 2 3 4 5 →