Kernel Dynamic Policy Programming: Practical Reinforcement Learning for High-dimensional Robots

被引:0
|
作者
Cui, Yunduan
Matsubara, Takamitsu
Sugimoto, Kenji
机构
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Applying value function based reinforcement learning algorithms to real robots has been infeasible because the approximation of high-dimensional value function is difficult. The difficulty of such high-dimensional value function approximation in previous methods are twofold: 1) instability of value function approximation by non-smooth policy update and 2) computational complexity associated with high-dimensional state-action space. To cope with these issues, in this paper, we propose Kernel Dynamic Policy Programming (KDPP) that smoothly updates value function in an implicit high-dimensional feature space. The smooth policy update is promoted by adding the Kullback-Leibler divergence between current and updated policies in reward function as a regularization term to stabilize the value function approximation. The computational complexity is reduced by applying the kernel trick in the value function approximation. Therefore, KDPP can be interpreted as a novel yet practical extension of Dynamic Policy Programming (DPP) and kernelized value function-based reinforcement learning methods to combine the strengths of them. We successfully applied KDPP to learn unscrewing bottle cap in a Pneumatic Artificial Muscles (PAMs) driven humanoid robot hand, a system with 24 dimensional state space, with limited number of samples and commonplace computational resource.
引用
收藏
页码:662 / 667
页数:6
相关论文
共 50 条
  • [41] Dynamic visualization of statistical learning in the context of high-dimensional textual data
    Greenacre, Michael
    Hastie, Trevor
    JOURNAL OF WEB SEMANTICS, 2010, 8 (2-3): : 163 - 168
  • [42] Practical bound for dimensionality in high-dimensional entanglement
    Romero, Jacquiline
    Padgett, Miles
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2014, 111 (17) : 6122 - 6123
  • [43] A PRACTICAL HIGH-DIMENSIONAL SPARSE FOURIER TRANSFORM
    Wang, Shaogang
    Patel, Vishal M.
    Petropulu, Athina
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4341 - 4345
  • [44] APPLICATION OF DYNAMIC-PROGRAMMING TO HIGH-DIMENSIONAL NONLINEAR OPTIMAL-CONTROL PROBLEMS
    LUUS, R
    INTERNATIONAL JOURNAL OF CONTROL, 1990, 52 (01) : 239 - 250
  • [45] Sparse kernel methods for high-dimensional survival data
    Evers, Ludger
    Messow, Claudia-Martina
    BIOINFORMATICS, 2008, 24 (14) : 1632 - 1638
  • [46] TIME-OPTIMAL CONTROL OF HIGH-DIMENSIONAL SYSTEMS BY ITERATIVE DYNAMIC-PROGRAMMING
    BOJKOV, B
    LUUS, R
    CANADIAN JOURNAL OF CHEMICAL ENGINEERING, 1995, 73 (03): : 380 - 390
  • [47] Navigating Robots in Dynamic Environment With Deep Reinforcement Learning
    Zhou, Zhiqian
    Zeng, Zhiwen
    Lang, Lin
    Yao, Weijia
    Lu, Huimin
    Zheng, Zhiqiang
    Zhou, Zongtan
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (12) : 25201 - 25211
  • [48] Chance Constrained Motion Planning for High-Dimensional Robots
    Dai, Siyu
    Schaffert, Shawn
    Jasour, Ashkan
    Hofmann, Andreas
    Williams, Brian
    2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 8805 - 8811
  • [49] Human motion analysis in medical robotics via high-dimensional inverse reinforcement learning
    Li, Kun
    Burdick, Joel W.
    INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2020, 39 (05): : 568 - 585
  • [50] High-dimensional multi-period portfolio allocation using deep reinforcement learning
    Jiang, Yifu
    Olmo, Jose
    Atwi, Majed
    INTERNATIONAL REVIEW OF ECONOMICS & FINANCE, 2025, 98