Kernel Dynamic Policy Programming: Practical Reinforcement Learning for High-dimensional Robots

被引：0

作者：

Cui, Yunduan

Matsubara, Takamitsu

Sugimoto, Kenji

机构：

来源：

2016 IEEE-RAS 16TH INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS (HUMANOIDS) | 2016年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Applying value function based reinforcement learning algorithms to real robots has been infeasible because the approximation of high-dimensional value function is difficult. The difficulty of such high-dimensional value function approximation in previous methods are twofold: 1) instability of value function approximation by non-smooth policy update and 2) computational complexity associated with high-dimensional state-action space. To cope with these issues, in this paper, we propose Kernel Dynamic Policy Programming (KDPP) that smoothly updates value function in an implicit high-dimensional feature space. The smooth policy update is promoted by adding the Kullback-Leibler divergence between current and updated policies in reward function as a regularization term to stabilize the value function approximation. The computational complexity is reduced by applying the kernel trick in the value function approximation. Therefore, KDPP can be interpreted as a novel yet practical extension of Dynamic Policy Programming (DPP) and kernelized value function-based reinforcement learning methods to combine the strengths of them. We successfully applied KDPP to learn unscrewing bottle cap in a Pneumatic Artificial Muscles (PAMs) driven humanoid robot hand, a system with 24 dimensional state space, with limited number of samples and commonplace computational resource.

引用

页码：662 / 667

页数：6

共 50 条

[21] Machine learning for high-dimensional dynamic stochastic economies
Scheidegger, Simon
Bilionis, Ilias
JOURNAL OF COMPUTATIONAL SCIENCE, 2019, 33 : 68 - 82
[22] Fast High-Dimensional Kernel Filtering
Nair, Pravin
Chaudhury, Kunal Narayan
IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (02) : 377 - 381
[23] APPLICATION OF ITERATIVE DYNAMIC-PROGRAMMING TO VERY HIGH-DIMENSIONAL SYSTEMS
LUUS, R
HUNGARIAN JOURNAL OF INDUSTRIAL CHEMISTRY, 1993, 21 (04): : 243 - 250
[24] Abstraction from demonstration for efficient reinforcement learning in high-dimensional domains
Cobo, Luis C.
Subramanian, Kaushik
Isbell, Charles L., Jr.
Lanterman, Aaron D.
Thomaz, Andrea L.
ARTIFICIAL INTELLIGENCE, 2014, 216 : 103 - 128
[25] High-dimensional reinforcement learning for optimization and control of ultracold quantum gases
Milson, N.
Tashchilina, A.
Ooi, T.
Czarnecka, A.
Ahmad, Z. F.
Leblanc, L. J.
MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2023, 4 (04):
[26] Integration of genetic programming and reinforcement learning for real robots
Kamio, S
Mitsuhashi, H
Iba, H
GENETIC AND EVOLUTIONARY COMPUTATION - GECCO 2003, PT I, PROCEEDINGS, 2003, 2723 : 470 - 482
[27] Deep kernel learning of dynamical models from high-dimensional noisy data
Nicolò Botteghi
Mengwu Guo
Christoph Brune
Scientific Reports, 12
[28] Deep kernel learning of dynamical models from high-dimensional noisy data
Botteghi, Nicolo
Guo, Mengwu
Brune, Christoph
SCIENTIFIC REPORTS, 2022, 12 (01)
[29] HIGH-DIMENSIONAL A-LEARNING FOR OPTIMAL DYNAMIC TREATMENT REGIMES
Shi, Chengchun
Fan, Ailin
Song, Rui
Lu, Wenbin
ANNALS OF STATISTICS, 2018, 46 (03): : 925 - 957
[30] Efficient dynamic programming for high-dimensional, optimal motion planning by spectral learning of approximate value function symmetries
Vernaza, Paul
Lee, Daniel D.
2011 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2011,

← 1 2 3 4 5 →