Kernel Dynamic Policy Programming: Practical Reinforcement Learning for High-dimensional Robots

被引:0
|
作者
Cui, Yunduan
Matsubara, Takamitsu
Sugimoto, Kenji
机构
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Applying value function based reinforcement learning algorithms to real robots has been infeasible because the approximation of high-dimensional value function is difficult. The difficulty of such high-dimensional value function approximation in previous methods are twofold: 1) instability of value function approximation by non-smooth policy update and 2) computational complexity associated with high-dimensional state-action space. To cope with these issues, in this paper, we propose Kernel Dynamic Policy Programming (KDPP) that smoothly updates value function in an implicit high-dimensional feature space. The smooth policy update is promoted by adding the Kullback-Leibler divergence between current and updated policies in reward function as a regularization term to stabilize the value function approximation. The computational complexity is reduced by applying the kernel trick in the value function approximation. Therefore, KDPP can be interpreted as a novel yet practical extension of Dynamic Policy Programming (DPP) and kernelized value function-based reinforcement learning methods to combine the strengths of them. We successfully applied KDPP to learn unscrewing bottle cap in a Pneumatic Artificial Muscles (PAMs) driven humanoid robot hand, a system with 24 dimensional state space, with limited number of samples and commonplace computational resource.
引用
收藏
页码:662 / 667
页数:6
相关论文
共 50 条
  • [1] Kernel dynamic policy programming: Applicable reinforcement learning to robot systems with high dimensional states
    Cui, Yunduan
    Matsubara, Takamitsu
    Sugimoto, Kenji
    NEURAL NETWORKS, 2017, 94 : 13 - 23
  • [2] High-dimensional Function Optimisation by Reinforcement Learning
    Wu, Q. H.
    Liao, H. L.
    2010 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2010,
  • [3] Inverse Reinforcement Learning Using Dynamic Policy Programming
    Uchibe, Eiji
    Doya, Kenji
    FOUTH JOINT IEEE INTERNATIONAL CONFERENCES ON DEVELOPMENT AND LEARNING AND EPIGENETIC ROBOTICS (IEEE ICDL-EPIROB 2014), 2014, : 222 - 228
  • [4] Safe Reinforcement Learning of Dynamic High-Dimensional Robotic Tasks: Navigation, Manipulation, Interaction
    Liu, Puze
    Zhang, Kuo
    Tateo, Davide
    Jauhri, Snehal
    Hu, Zhiyuan
    Peters, Jan
    Chalvatzaki, Georgia
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 9449 - 9456
  • [5] Reinforcement learning for high-dimensional problems with symmetrical actions
    Kamal, MAS
    Murata, J
    2004 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOLS 1-7, 2004, : 6192 - 6197
  • [6] Offline reinforcement learning in high-dimensional stochastic environments
    Félicien Hêche
    Oussama Barakat
    Thibaut Desmettre
    Tania Marx
    Stephan Robert-Nicoud
    Neural Computing and Applications, 2024, 36 : 585 - 598
  • [7] Offline reinforcement learning in high-dimensional stochastic environments
    Heche, Felicien
    Barakat, Oussama
    Desmettre, Thibaut
    Marx, Tania
    Robert-Nicoud, Stephan
    NEURAL COMPUTING & APPLICATIONS, 2023, 36 (2): : 585 - 598
  • [8] Challenges in High-Dimensional Reinforcement Learning with Evolution Strategies
    Mueller, Nils
    Glasmachers, Tobias
    PARALLEL PROBLEM SOLVING FROM NATURE - PPSN XV, PT II, 2018, 11102 : 411 - 423
  • [9] Emergent Solutions to High-Dimensional Multitask Reinforcement Learning
    Kelly, Stephen
    Heywood, Malcolm, I
    EVOLUTIONARY COMPUTATION, 2018, 26 (03) : 347 - 380
  • [10] High-Dimensional Stock Portfolio Trading with Deep Reinforcement Learning
    Pigorsch, Uta
    Schaefer, Sebastian
    2022 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE FOR FINANCIAL ENGINEERING AND ECONOMICS (CIFER), 2022,