Balance Control of a Biped Robot on a Rotating Platform Based on Efficient Reinforcement Learning

被引：28

作者：

Xi, Ao ^{[1
]}

Mudiyanselage, Thushal Wijekoon ^{[2
]}

Tao, Dacheng ^{[3
,4
]}

Chen, Chao ^{[1
]}

机构：

[1] Monash Univ, Dept Mech & Aerosp Engn, Clayton, Vic 3800, Australia

[2] Monash Univ, Clayton, Vic 3800, Australia

[3] Univ Sydney, UBTECH Sydney Artificial Intelligence Ctr, Fac Engn, Sydney, NSW 2008, Australia

[4] Univ Sydney, Sch Comp Sci, Fac Engn, Sydney, NSW 2008, Australia

来源：

IEEE-CAA JOURNAL OF AUTOMATICA SINICA | 2019年 / 6卷 / 04期

关键词：

Biped robot; Gaussian processes (GP); reinforcement learning; temporal difference; HUMANOID ROBOT;

D O I：

10.1109/JAS.2019.1911567

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this work, we combined the model based reinforcement learning (MBRL) and model free reinforcement learning (MFRL) to stabilize a biped robot (NAO robot) on a rotating platform, where the angular velocity of the platform is unknown for the proposed learning algorithm and treated as the external disturbance. Nonparametric Gaussian processes normally require a large number of training data points to deal with the discontinuity of the estimated model Although some improved method such as probabilistic inference for learning control (PILCO) does not require an explicit global model as the actions are obtained by directly searching the policy space, the overfitting and lack of model complexity may still result in a large deviation between the prediction and the real system. Besides, none of these approaches consider the data error and measurement noise during the training process and test process, respectively. We propose a hierarchical Gaussian processes (GP) models, containing two layers of independent GPs, where the physically continuous probability transition model of the robot is obtained. Due to the physically continuous estimation, the algorithm overcomes the overfitting problem with a guaranteed model complexity, and the number of training data is also reduced. The policy for any given initial state is generated automatically by minimizing the expected cost according to the predefined cost function and the obtained probability distribution of the state. Furthermore, a novel Q(lambda) based MFRL method scheme is employed to improve the policy. Simulation results show that the proposed RL algorithm is able to balance NAO robot on a rotating platform, and it is capable of adapting to the platform with varying angular velocity.

引用

页码：938 / 951

页数：14

共 50 条

[1] Balance Control of a Biped Robot on a Rotating Platform Based on Efficient Reinforcement Learning
Ao Xi
Thushal Wijekoon Mudiyanselage
Dacheng Tao
Chao Chen
[J]. IEEE/CAA Journal of Automatica Sinica, 2019, 6 (04) : 938 - 951
[2] Gait Balance of Biped Robot based on Reinforcement Learning
Hwang, Kao-Shing
Li, Jhe-Syun
Jiang, Wei-Cheng
Wang, Wei-Han
[J]. 2013 PROCEEDINGS OF SICE ANNUAL CONFERENCE (SICE), 2013, : 435 - 439
[3] Stability Control of a Biped Robot on a Dynamic Platform Based on Hybrid Reinforcement Learning
Xi, Ao
Chen, Chao
[J]. SENSORS, 2020, 20 (16) : 1 - 21
[4] Walking Control of a Biped Robot on Static and Rotating Platforms Based on Hybrid Reinforcement Learning
Xi, Ao
Chen, Chao
[J]. IEEE ACCESS, 2020, 8 : 148411 - 148424
[5] Biped Balance Control by Reinforcement Learning
Hwang, Kao-Shing
Lin, Jin-Ling
Li, Jhe-Syun
[J]. JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2016, 32 (04) : 1041 - 1060
[6] Posture self-stabilizer of a biped robot based on training platform and reinforcement learning
Wu, Weiguo
Gao, Liyang
[J]. ROBOTICS AND AUTONOMOUS SYSTEMS, 2017, 98 : 42 - 55
[7] Dynamic balance of a biped robot using fuzzy reinforcement learning agents
Zhou, CJ
Meng, QC
[J]. FUZZY SETS AND SYSTEMS, 2003, 134 (01) : 169 - 187
[8] A Disturbance Rejection Control Method Based on Deep Reinforcement Learning for a Biped Robot
Liu, Chuzhao
Gao, Junyao
Tian, Dingkui
Zhang, Xuefeng
Liu, Huaxin
Meng, Libo
[J]. APPLIED SCIENCES-BASEL, 2021, 11 (04): : 1 - 17
[9] Learning an Efficient Gait Cycle of a Biped Robot Based on Reinforcement Learning and Artificial Neural Networks
Gil, Cristyan R.
Calvo, Hiram
Sossa, Humberto
[J]. APPLIED SCIENCES-BASEL, 2019, 9 (03):
[10] Deep reinforcement learning method for biped robot gait control
Feng, Chun
Zhang, Yiwei
Huang, Cheng
Jiang, Wenbiao
Wu, Zhiwei
[J]. Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2021, 27 (08): : 2341 - 2349

← 1 2 3 4 5 →