Kernel Dynamic Policy Programming: Practical Reinforcement Learning for High-dimensional Robots

被引：0

作者：

Cui, Yunduan

Matsubara, Takamitsu

Sugimoto, Kenji

机构：

来源：

2016 IEEE-RAS 16TH INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS (HUMANOIDS) | 2016年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Applying value function based reinforcement learning algorithms to real robots has been infeasible because the approximation of high-dimensional value function is difficult. The difficulty of such high-dimensional value function approximation in previous methods are twofold: 1) instability of value function approximation by non-smooth policy update and 2) computational complexity associated with high-dimensional state-action space. To cope with these issues, in this paper, we propose Kernel Dynamic Policy Programming (KDPP) that smoothly updates value function in an implicit high-dimensional feature space. The smooth policy update is promoted by adding the Kullback-Leibler divergence between current and updated policies in reward function as a regularization term to stabilize the value function approximation. The computational complexity is reduced by applying the kernel trick in the value function approximation. Therefore, KDPP can be interpreted as a novel yet practical extension of Dynamic Policy Programming (DPP) and kernelized value function-based reinforcement learning methods to combine the strengths of them. We successfully applied KDPP to learn unscrewing bottle cap in a Pneumatic Artificial Muscles (PAMs) driven humanoid robot hand, a system with 24 dimensional state space, with limited number of samples and commonplace computational resource.

引用

页码：662 / 667

页数：6

共 50 条

[31] High-Dimensional Experimental Design and Kernel Bandits
Camilleri, Romain
Katz-Samuels, Julian
Jamieson, Kevin
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[32] Feedforward neural networks in reinforcement learning applied to high-dimensional motor control
Coulom, R
ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2002, 2533 : 403 - 413
[33] NEURAL DISCRETE ABSTRACTION OF HIGH-DIMENSIONAL SPACES: A CASE STUDY IN REINFORCEMENT LEARNING
Giannakopoulos, Petros
Pikrakis, Aggelos
Cotronis, Yannis
28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 1517 - 1521
[34] WrapperRL: Reinforcement Learning Agent for Feature Selection in High-Dimensional Industrial Data
Shaer, Ibrahim
Shami, Abdallah
IEEE ACCESS, 2024, 12 : 128338 - 128348
[35] Deep reinforcement learning for irrigation scheduling using high-dimensional sensor feedback
Saikai, Yuji
Peake, Allan
Chenu, Karine
PLOS WATER, 2023, 2 (09):
[36] Learning high-dimensional data
Verleysen, M
LIMITATIONS AND FUTURE TRENDS IN NEURAL COMPUTATION, 2003, 186 : 141 - 162
[37] Dual representations for dynamic programming and reinforcement learning
Wang, Tao
Bowling, Michael
Schuurmans, Dale
2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, : 44 - +
[38] Learning to Control Complex Robots Using High-Dimensional Body-Machine Interfaces
Lee, Jongmin M.
Gebrekristos, Temesgen
De Santis, Dalia
Nejati-Javaremi, Mahdieh
Gopinath, Deepak
Parikh, Biraj
Mussa-Ivaldi, Ferdinando A.
Argall, Brenna D.
ACM TRANSACTIONS ON HUMAN-ROBOT INTERACTION, 2024, 13 (03)
[39] Entropy-Isomap: Manifold Learning for High-dimensional Dynamic Processes
Schoeneman, Frank
Chandola, Varun
Napp, Nils
Wodo, Olga
Zola, Jaroslaw
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 1655 - 1660
[40] Dynamic Batch Learning in High-Dimensional Sparse Linear Contextual Bandits
Ren, Zhimei
Zhou, Zhengyuan
MANAGEMENT SCIENCE, 2024, 70 (02) : 1315 - 1342

← 1 2 3 4 5 →