Policy gradient reinforcement learning for fast quadrupedal locomotion

被引:249
|
作者
Kohl, N [1 ]
Stone, P [1 ]
机构
[1] Univ Texas, Dept Comp Sci, Austin, TX 78712 USA
关键词
learning control; walking robots; multi legged robots;
D O I
10.1109/ROBOT.2004.1307456
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a machine learning approach to optimizing a quadrupedal trot gait for forward speed. Given a parameterized walk designed for a specific robot, we propose using a form of policy gradient reinforcement learning to automatically search the set of possible parameters with the goal of finding the fastest possible walk. We implement and test our approach on a commercially available quadrupedal robot platform, namely the Sony Aibo robot. After about three hours of learning, all on the physical robots and with no human intervention other than to change the batteries, the robots achieved a gait faster than any previously known gait known for the Aibo, significantly outperforming a variety of existing hand-coded and learned solutions.
引用
收藏
页码:2619 / 2624
页数:6
相关论文
共 50 条
  • [41] Policy Gradient using Weak Derivatives for Reinforcement Learning
    Bhatt, Sujay
    Koppel, Alec
    Krishnamurthy, Vikram
    [J]. 2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 5531 - 5537
  • [42] Evolution-Guided Policy Gradient in Reinforcement Learning
    Khadka, Shauharda
    Tumer, Kagan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [43] Fuzzy Baselines to Stabilize Policy Gradient Reinforcement Learning
    Surita, Gabriela
    Lemos, Andre
    Gomide, Fernando
    [J]. EXPLAINABLE AI AND OTHER APPLICATIONS OF FUZZY TECHNIQUES, NAFIPS 2021, 2022, 258 : 436 - 446
  • [44] Policy gradient methods for reinforcement learning with function approximation
    Sutton, RS
    McAllester, D
    Singh, S
    Mansour, Y
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 1057 - 1063
  • [45] Inverse Reinforcement Learning through Policy Gradient Minimization
    Pirotta, Matteo
    Restelli, Marcello
    [J]. THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 1993 - 1999
  • [46] Batch Reinforcement Learning With a Nonparametric Off-Policy Policy Gradient
    Tosatto, Samuele
    Carvalho, Joao
    Peters, Jan
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) : 5996 - 6010
  • [47] Meta Reinforcement Learning of Locomotion Policy for Quadruped Robots With Motor Stuck
    Chen, Ci
    Li, Chao
    Lu, Haojian
    Wang, Yue
    Xiong, Rong
    [J]. IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024,
  • [48] Optimal quadrupedal locomotion
    Srinivasan, M.
    [J]. INTEGRATIVE AND COMPARATIVE BIOLOGY, 2016, 56 : E209 - E209
  • [49] THE DYNAMICS OF QUADRUPEDAL LOCOMOTION
    PANDY, MG
    KUMAR, V
    BERME, N
    WALDRON, KJ
    [J]. JOURNAL OF BIOMECHANICAL ENGINEERING-TRANSACTIONS OF THE ASME, 1988, 110 (03): : 230 - 237
  • [50] Understanding quadrupedal locomotion
    Weijs, WA
    [J]. EUROPEAN JOURNAL OF MORPHOLOGY, 1998, 36 (4-5): : 270 - 271