Kernel Dynamic Policy Programming: Practical Reinforcement Learning for High-dimensional Robots

被引:0
|
作者
Cui, Yunduan
Matsubara, Takamitsu
Sugimoto, Kenji
机构
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Applying value function based reinforcement learning algorithms to real robots has been infeasible because the approximation of high-dimensional value function is difficult. The difficulty of such high-dimensional value function approximation in previous methods are twofold: 1) instability of value function approximation by non-smooth policy update and 2) computational complexity associated with high-dimensional state-action space. To cope with these issues, in this paper, we propose Kernel Dynamic Policy Programming (KDPP) that smoothly updates value function in an implicit high-dimensional feature space. The smooth policy update is promoted by adding the Kullback-Leibler divergence between current and updated policies in reward function as a regularization term to stabilize the value function approximation. The computational complexity is reduced by applying the kernel trick in the value function approximation. Therefore, KDPP can be interpreted as a novel yet practical extension of Dynamic Policy Programming (DPP) and kernelized value function-based reinforcement learning methods to combine the strengths of them. We successfully applied KDPP to learn unscrewing bottle cap in a Pneumatic Artificial Muscles (PAMs) driven humanoid robot hand, a system with 24 dimensional state space, with limited number of samples and commonplace computational resource.
引用
收藏
页码:662 / 667
页数:6
相关论文
共 50 条
  • [21] Machine learning for high-dimensional dynamic stochastic economies
    Scheidegger, Simon
    Bilionis, Ilias
    JOURNAL OF COMPUTATIONAL SCIENCE, 2019, 33 : 68 - 82
  • [22] Fast High-Dimensional Kernel Filtering
    Nair, Pravin
    Chaudhury, Kunal Narayan
    IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (02) : 377 - 381
  • [23] APPLICATION OF ITERATIVE DYNAMIC-PROGRAMMING TO VERY HIGH-DIMENSIONAL SYSTEMS
    LUUS, R
    HUNGARIAN JOURNAL OF INDUSTRIAL CHEMISTRY, 1993, 21 (04): : 243 - 250
  • [24] Abstraction from demonstration for efficient reinforcement learning in high-dimensional domains
    Cobo, Luis C.
    Subramanian, Kaushik
    Isbell, Charles L., Jr.
    Lanterman, Aaron D.
    Thomaz, Andrea L.
    ARTIFICIAL INTELLIGENCE, 2014, 216 : 103 - 128
  • [25] High-dimensional reinforcement learning for optimization and control of ultracold quantum gases
    Milson, N.
    Tashchilina, A.
    Ooi, T.
    Czarnecka, A.
    Ahmad, Z. F.
    Leblanc, L. J.
    MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2023, 4 (04):
  • [26] Integration of genetic programming and reinforcement learning for real robots
    Kamio, S
    Mitsuhashi, H
    Iba, H
    GENETIC AND EVOLUTIONARY COMPUTATION - GECCO 2003, PT I, PROCEEDINGS, 2003, 2723 : 470 - 482
  • [27] Deep kernel learning of dynamical models from high-dimensional noisy data
    Nicolò Botteghi
    Mengwu Guo
    Christoph Brune
    Scientific Reports, 12
  • [28] Deep kernel learning of dynamical models from high-dimensional noisy data
    Botteghi, Nicolo
    Guo, Mengwu
    Brune, Christoph
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [29] HIGH-DIMENSIONAL A-LEARNING FOR OPTIMAL DYNAMIC TREATMENT REGIMES
    Shi, Chengchun
    Fan, Ailin
    Song, Rui
    Lu, Wenbin
    ANNALS OF STATISTICS, 2018, 46 (03): : 925 - 957
  • [30] Efficient dynamic programming for high-dimensional, optimal motion planning by spectral learning of approximate value function symmetries
    Vernaza, Paul
    Lee, Daniel D.
    2011 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2011,