Trajectory-based Probabilistic Policy Gradient for Learning Locomotion Behaviors

被引：0

作者：

Choi, Sungjoon ^{[1
]}

Kim, Joohyung ^{[1
]}

机构：

[1] Disney Res, 521 Circle Seven Dr, Glendale, CA 91201 USA

来源：

2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA) | 2019年

关键词：

OPTIMIZATION;

D O I：

10.1109/icra.2019.8794207

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we propose a trajectory-based reinforcement learning method named deep latent policy gradient (DLPG) for learning locomotion skills. We define the policy function as a probability distribution over trajectories and train the policy using a deep latent variable model to achieve sample efficient skill learning. We first evaluate the sample efficiency of DLPG compared to the state-of-the-art reinforcement learning methods in simulated environments. Then, we apply the proposed method to a four-legged walking robot named Snapbot to learn three basic locomotion skills of turn left, go straight, and turn right. We demonstrate that, by properly designing two reward functions for curriculum learning, Snapbot successfully learns the desired locomotion skills with moderate sample complexity.

引用

页码：1 / 7

页数：7

共 50 条

[1] Trajectory-Based Off-Policy Deep Reinforcement Learning
Doerr, Andreas
Volpp, Michael
Toussaint, Marc
Trimpe, Sebastian
Daniel, Christian
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[2] Trajectory-Based Modified Policy Iteration
Sharma, R.
Gopal, M.
PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 12, 2006, 12 : 103 - +
[3] A unified probabilistic approach for modeling trajectory-based separations
Wei, J
Realff, MJ
AICHE JOURNAL, 2005, 51 (09) : 2507 - 2520
[4] Framework for trajectory-based probabilistic security assessment of power systems
Perkin, Samuel
Hamon, Camille
Kristjansson, Ragnar
Stefansson, Hlynur
Jensson, Pall
IET GENERATION TRANSMISSION & DISTRIBUTION, 2019, 13 (07) : 1088 - 1094
[5] Learning CPG-based biped locomotion with a policy gradient method
Matsubara, T
Morimoto, J
Nakanishi, J
Sato, M
Doya, K
2005 5TH IEEE-RAS INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS, 2005, : 208 - 213
[6] Learning CPG-based biped locomotion with a policy gradient method
Matsubara, T. (takam-m@atr.jp), (Inst. of Elec. and Elec. Eng. Computer Society, 445 Hoes Lane - P.O.Box 1331, Piscataway, NJ 08855-1331, United States):
[7] Learning CPG-based biped locomotion with a policy gradient method
Matsubara, Takamitsu
Morimoto, Jun
Nakanishi, Jun
Sato, Masa-aki
Doya, Kenji
ROBOTICS AND AUTONOMOUS SYSTEMS, 2006, 54 (11) : 911 - 920
[8] Trajectory-based Split Hindsight Reverse Curriculum Learning
Wu, Jiaxi
Zhang, Dianmin
Zhong, Shanlin
Qiao, Hong
2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 3971 - 3978
[9] Policy gradient reinforcement learning for fast quadrupedal locomotion
Kohl, N
Stone, P
2004 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1- 5, PROCEEDINGS, 2004, : 2619 - 2624
[10] Trajectory-based codes
Michael Domaratzki
Acta Informatica, 2004, 40 : 491 - 527

← 1 2 3 4 5 →