Kernel Dynamic Policy Programming: Practical Reinforcement Learning for High-dimensional Robots

被引：0

作者：

Cui, Yunduan

Matsubara, Takamitsu

Sugimoto, Kenji

机构：

来源：

2016 IEEE-RAS 16TH INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS (HUMANOIDS) | 2016年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Applying value function based reinforcement learning algorithms to real robots has been infeasible because the approximation of high-dimensional value function is difficult. The difficulty of such high-dimensional value function approximation in previous methods are twofold: 1) instability of value function approximation by non-smooth policy update and 2) computational complexity associated with high-dimensional state-action space. To cope with these issues, in this paper, we propose Kernel Dynamic Policy Programming (KDPP) that smoothly updates value function in an implicit high-dimensional feature space. The smooth policy update is promoted by adding the Kullback-Leibler divergence between current and updated policies in reward function as a regularization term to stabilize the value function approximation. The computational complexity is reduced by applying the kernel trick in the value function approximation. Therefore, KDPP can be interpreted as a novel yet practical extension of Dynamic Policy Programming (DPP) and kernelized value function-based reinforcement learning methods to combine the strengths of them. We successfully applied KDPP to learn unscrewing bottle cap in a Pneumatic Artificial Muscles (PAMs) driven humanoid robot hand, a system with 24 dimensional state space, with limited number of samples and commonplace computational resource.

引用

页码：662 / 667

页数：6

共 50 条

[41] Dynamic visualization of statistical learning in the context of high-dimensional textual data
Greenacre, Michael
Hastie, Trevor
JOURNAL OF WEB SEMANTICS, 2010, 8 (2-3): : 163 - 168
[42] Practical bound for dimensionality in high-dimensional entanglement
Romero, Jacquiline
Padgett, Miles
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2014, 111 (17) : 6122 - 6123
[43] A PRACTICAL HIGH-DIMENSIONAL SPARSE FOURIER TRANSFORM
Wang, Shaogang
Patel, Vishal M.
Petropulu, Athina
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4341 - 4345
[44] APPLICATION OF DYNAMIC-PROGRAMMING TO HIGH-DIMENSIONAL NONLINEAR OPTIMAL-CONTROL PROBLEMS
LUUS, R
INTERNATIONAL JOURNAL OF CONTROL, 1990, 52 (01) : 239 - 250
[45] Sparse kernel methods for high-dimensional survival data
Evers, Ludger
Messow, Claudia-Martina
BIOINFORMATICS, 2008, 24 (14) : 1632 - 1638
[46] TIME-OPTIMAL CONTROL OF HIGH-DIMENSIONAL SYSTEMS BY ITERATIVE DYNAMIC-PROGRAMMING
BOJKOV, B
LUUS, R
CANADIAN JOURNAL OF CHEMICAL ENGINEERING, 1995, 73 (03): : 380 - 390
[47] Navigating Robots in Dynamic Environment With Deep Reinforcement Learning
Zhou, Zhiqian
Zeng, Zhiwen
Lang, Lin
Yao, Weijia
Lu, Huimin
Zheng, Zhiqiang
Zhou, Zongtan
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (12) : 25201 - 25211
[48] Chance Constrained Motion Planning for High-Dimensional Robots
Dai, Siyu
Schaffert, Shawn
Jasour, Ashkan
Hofmann, Andreas
Williams, Brian
2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 8805 - 8811
[49] Human motion analysis in medical robotics via high-dimensional inverse reinforcement learning
Li, Kun
Burdick, Joel W.
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2020, 39 (05): : 568 - 585
[50] High-dimensional multi-period portfolio allocation using deep reinforcement learning
Jiang, Yifu
Olmo, Jose
Atwi, Majed
INTERNATIONAL REVIEW OF ECONOMICS & FINANCE, 2025, 98

← 1 2 3 4 5 →