Trajectory-based Probabilistic Policy Gradient for Learning Locomotion Behaviors

被引:0
|
作者
Choi, Sungjoon [1 ]
Kim, Joohyung [1 ]
机构
[1] Disney Res, 521 Circle Seven Dr, Glendale, CA 91201 USA
关键词
OPTIMIZATION;
D O I
10.1109/icra.2019.8794207
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a trajectory-based reinforcement learning method named deep latent policy gradient (DLPG) for learning locomotion skills. We define the policy function as a probability distribution over trajectories and train the policy using a deep latent variable model to achieve sample efficient skill learning. We first evaluate the sample efficiency of DLPG compared to the state-of-the-art reinforcement learning methods in simulated environments. Then, we apply the proposed method to a four-legged walking robot named Snapbot to learn three basic locomotion skills of turn left, go straight, and turn right. We demonstrate that, by properly designing two reward functions for curriculum learning, Snapbot successfully learns the desired locomotion skills with moderate sample complexity.
引用
收藏
页码:1 / 7
页数:7
相关论文
共 50 条
  • [1] Trajectory-Based Off-Policy Deep Reinforcement Learning
    Doerr, Andreas
    Volpp, Michael
    Toussaint, Marc
    Trimpe, Sebastian
    Daniel, Christian
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [2] Trajectory-Based Modified Policy Iteration
    Sharma, R.
    Gopal, M.
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 12, 2006, 12 : 103 - +
  • [3] A unified probabilistic approach for modeling trajectory-based separations
    Wei, J
    Realff, MJ
    AICHE JOURNAL, 2005, 51 (09) : 2507 - 2520
  • [4] Framework for trajectory-based probabilistic security assessment of power systems
    Perkin, Samuel
    Hamon, Camille
    Kristjansson, Ragnar
    Stefansson, Hlynur
    Jensson, Pall
    IET GENERATION TRANSMISSION & DISTRIBUTION, 2019, 13 (07) : 1088 - 1094
  • [5] Learning CPG-based biped locomotion with a policy gradient method
    Matsubara, T
    Morimoto, J
    Nakanishi, J
    Sato, M
    Doya, K
    2005 5TH IEEE-RAS INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS, 2005, : 208 - 213
  • [6] Learning CPG-based biped locomotion with a policy gradient method
    Matsubara, T. (takam-m@atr.jp), (Inst. of Elec. and Elec. Eng. Computer Society, 445 Hoes Lane - P.O.Box 1331, Piscataway, NJ 08855-1331, United States):
  • [7] Learning CPG-based biped locomotion with a policy gradient method
    Matsubara, Takamitsu
    Morimoto, Jun
    Nakanishi, Jun
    Sato, Masa-aki
    Doya, Kenji
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2006, 54 (11) : 911 - 920
  • [8] Trajectory-based Split Hindsight Reverse Curriculum Learning
    Wu, Jiaxi
    Zhang, Dianmin
    Zhong, Shanlin
    Qiao, Hong
    2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 3971 - 3978
  • [9] Policy gradient reinforcement learning for fast quadrupedal locomotion
    Kohl, N
    Stone, P
    2004 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1- 5, PROCEEDINGS, 2004, : 2619 - 2624
  • [10] Trajectory-based codes
    Michael Domaratzki
    Acta Informatica, 2004, 40 : 491 - 527