Stability Control of a Biped Robot on a Dynamic Platform Based on Hybrid Reinforcement Learning

被引:5
|
作者
Xi, Ao [1 ]
Chen, Chao [1 ]
机构
[1] Monash Univ, Lab Mot Generat & Anal, Fac Engn, Clayton, Vic 3800, Australia
关键词
biped robot; reinforcement learning; stability control; Gaussian processes; DQN (lambda);
D O I
10.3390/s20164468
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
In this work, we introduced a novel hybrid reinforcement learning scheme to balance a biped robot (NAO) on an oscillating platform, where the rotation of the platform is considered as the external disturbance to the robot. The platform had two degrees of freedom in rotation, pitch and roll. The state space comprised the position of center of pressure, and joint angles and joint velocities of two legs. The action space consisted of the joint angles of ankles, knees, and hips. By adding the inverse kinematics techniques, the dimension of action space was significantly reduced. Then, a model-based system estimator was employed during the offline training procedure to estimate the dynamics model of the system by using novel hierarchical Gaussian processes, and to provide initial control inputs, after which the reduced action space of each joint was obtained by minimizing the cost of reaching the desired stable state. Finally, a model-free optimizer based on DQN (lambda) was introduced to fine tune the initial control inputs, where the optimal control inputs were obtained for each joint at any state. The proposed reinforcement learning not only successfully avoided the distribution mismatch problem, but also improved the sample efficiency. Simulation results showed that the proposed hybrid reinforcement learning mechanism enabled the NAO robot to balance on an oscillating platform with different frequencies and magnitudes. Both control performance and robustness were guaranteed during the experiments.
引用
收藏
页码:1 / 21
页数:22
相关论文
共 50 条
  • [21] Dynamic control of a biped walking robot
    Löffler, K
    Gienger, M
    Pfeiffer, F
    [J]. ZEITSCHRIFT FUR ANGEWANDTE MATHEMATIK UND MECHANIK, 2000, 80 : S357 - S358
  • [22] Reinforcement learning for a biped robot to climb sloping surfaces
    Salatian, AW
    Yi, KY
    Zheng, YF
    [J]. JOURNAL OF ROBOTIC SYSTEMS, 1997, 14 (04): : 283 - 296
  • [23] Reinforcement learning for a CPG-driven biped robot
    Mori, T
    Nakamura, Y
    Sato, M
    Ishii, S
    [J]. PROCEEDING OF THE NINETEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE SIXTEENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2004, : 623 - 630
  • [24] Fuzzy control for dynamic biped robot based on adaptive network
    Huai, Chuangfeng
    Fang, Yuefa
    Guo, Sheng
    [J]. Beijing Jiaotong Daxue Xuebao/Journal of Beijing Jiaotong University, 2008, 32 (01): : 108 - 111
  • [25] Biped dynamic walking using reinforcement learning
    Benbrahim, H
    Franklin, JA
    [J]. ROBOTICS AND AUTONOMOUS SYSTEMS, 1997, 22 (3-4) : 283 - 302
  • [26] Reinforcement learning for a biped robot based on a CPG-actor-critic method
    Nakamura, Yutaka
    Mori, Takeshi
    Sato, Masa-Aki
    Ishii, Shin
    [J]. NEURAL NETWORKS, 2007, 20 (06) : 723 - 735
  • [27] Reinforcement learning method-based stable gait synthesis for biped robot
    Hu, LY
    Sun, ZQ
    [J]. 2004 8TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION, VOLS 1-3, 2004, : 1017 - 1022
  • [28] Reinforcement learning for platform-independent visual robot control
    Muse, David
    Burn, Kevin
    Wermter, Stefan
    [J]. 2006 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORK PROCEEDINGS, VOLS 1-10, 2006, : 2459 - +
  • [29] Hybrid Dynamic Control Algorithm for Humanoid Robots Based on Reinforcement Learning
    Duśko M. Katić
    Aleksandar D. Rodić
    Miomir K. Vukobratović
    [J]. Journal of Intelligent and Robotic Systems, 2008, 51 : 3 - 30
  • [30] Hybrid dynamic control algorithm for humanoid robots based on reinforcement learning
    Katic, Dusko M.
    Rodic, Aleksandar D.
    Vukobratovic, Miomir K.
    [J]. JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2008, 51 (01) : 3 - 30