Kernel Dynamic Policy Programming: Practical Reinforcement Learning for High-dimensional Robots

被引:0
|
作者
Cui, Yunduan
Matsubara, Takamitsu
Sugimoto, Kenji
机构
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Applying value function based reinforcement learning algorithms to real robots has been infeasible because the approximation of high-dimensional value function is difficult. The difficulty of such high-dimensional value function approximation in previous methods are twofold: 1) instability of value function approximation by non-smooth policy update and 2) computational complexity associated with high-dimensional state-action space. To cope with these issues, in this paper, we propose Kernel Dynamic Policy Programming (KDPP) that smoothly updates value function in an implicit high-dimensional feature space. The smooth policy update is promoted by adding the Kullback-Leibler divergence between current and updated policies in reward function as a regularization term to stabilize the value function approximation. The computational complexity is reduced by applying the kernel trick in the value function approximation. Therefore, KDPP can be interpreted as a novel yet practical extension of Dynamic Policy Programming (DPP) and kernelized value function-based reinforcement learning methods to combine the strengths of them. We successfully applied KDPP to learn unscrewing bottle cap in a Pneumatic Artificial Muscles (PAMs) driven humanoid robot hand, a system with 24 dimensional state space, with limited number of samples and commonplace computational resource.
引用
收藏
页码:662 / 667
页数:6
相关论文
共 50 条
  • [31] High-Dimensional Experimental Design and Kernel Bandits
    Camilleri, Romain
    Katz-Samuels, Julian
    Jamieson, Kevin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [32] Feedforward neural networks in reinforcement learning applied to high-dimensional motor control
    Coulom, R
    ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2002, 2533 : 403 - 413
  • [33] NEURAL DISCRETE ABSTRACTION OF HIGH-DIMENSIONAL SPACES: A CASE STUDY IN REINFORCEMENT LEARNING
    Giannakopoulos, Petros
    Pikrakis, Aggelos
    Cotronis, Yannis
    28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 1517 - 1521
  • [34] WrapperRL: Reinforcement Learning Agent for Feature Selection in High-Dimensional Industrial Data
    Shaer, Ibrahim
    Shami, Abdallah
    IEEE ACCESS, 2024, 12 : 128338 - 128348
  • [35] Deep reinforcement learning for irrigation scheduling using high-dimensional sensor feedback
    Saikai, Yuji
    Peake, Allan
    Chenu, Karine
    PLOS WATER, 2023, 2 (09):
  • [36] Learning high-dimensional data
    Verleysen, M
    LIMITATIONS AND FUTURE TRENDS IN NEURAL COMPUTATION, 2003, 186 : 141 - 162
  • [37] Dual representations for dynamic programming and reinforcement learning
    Wang, Tao
    Bowling, Michael
    Schuurmans, Dale
    2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, : 44 - +
  • [38] Learning to Control Complex Robots Using High-Dimensional Body-Machine Interfaces
    Lee, Jongmin M.
    Gebrekristos, Temesgen
    De Santis, Dalia
    Nejati-Javaremi, Mahdieh
    Gopinath, Deepak
    Parikh, Biraj
    Mussa-Ivaldi, Ferdinando A.
    Argall, Brenna D.
    ACM TRANSACTIONS ON HUMAN-ROBOT INTERACTION, 2024, 13 (03)
  • [39] Entropy-Isomap: Manifold Learning for High-dimensional Dynamic Processes
    Schoeneman, Frank
    Chandola, Varun
    Napp, Nils
    Wodo, Olga
    Zola, Jaroslaw
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 1655 - 1660
  • [40] Dynamic Batch Learning in High-Dimensional Sparse Linear Contextual Bandits
    Ren, Zhimei
    Zhou, Zhengyuan
    MANAGEMENT SCIENCE, 2024, 70 (02) : 1315 - 1342