Generalized Model Learning for Reinforcement Learning on a Humanoid Robot

被引：42

作者：

Hester, Todd ^{[1
]}

Quinlan, Michael ^{[1
]}

Stone, Peter ^{[1
]}

机构：

[1] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA

来源：

2010 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA) | 2010年

基金：

美国国家科学基金会;

关键词：

D O I：

10.1109/ROBOT.2010.5509181

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Reinforcement learning (RL) algorithms have long been promising methods for enabling an autonomous robot to improve its behavior on sequential decision-making tasks. The obvious enticement is that the robot should be able to improve its own behavior without the need for detailed step-by-step programming. However, for RL to reach its full potential, the algorithms must be sample efficient: they must learn competent behavior from very few real-world trials. From this perspective, model-based methods, which use experiential data more efficiently than model-free approaches, are appealing. But they often require exhaustive exploration to learn an accurate model of the domain. In this paper, we present an algorithm, Reinforcement Learning with Decision Trees (RLDT), that uses decision trees to learn the model by generalizing the relative effect of actions across states. The agent explores the environment until it believes it has a reasonable policy. The combination of the learning approach with the targeted exploration policy enables fast learning of the model. We compare RL-DT against standard model-free and model-based learning methods, and demonstrate its effectiveness on an Aldebaran Nao humanoid robot scoring goals in a penalty kick scenario.

引用

页码：2369 / 2374

页数：6

共 50 条

[1] Humanoid robot control based on reinforcement learning
[J]. Iida, S. (iida@ics.nitech.ac.jp), IEEE Robotics and Automation Society; Nagoya University, Japan; City of Nagoya, Japan; Nagoya City Science Museum; Chubu Science and Technology Center (Institute of Electrical and Electronics Engineers Inc.):
[2] Deep Reinforcement Learning for Humanoid Robot Behaviors
Muzio, Alexandre F. V.
Maximo, Marcos R. O. A.
Yoneyama, Takashi
[J]. JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2022, 105 (01)
[3] Humanoid robot control based on reinforcement learning
Iida, S
Kuwayama, K
Kanoh, M
Kato, S
Kunitachi, T
Itoh, H
[J]. PROCEEDINGS OF THE 2004 INTERNATIONAL SYMPOSIUM ON MICRO-NANOMECHATRONICS AND HUMAN SCIENCE, 2004, : 353 - 358
[4] Deep Reinforcement Learning for Humanoid Robot Dribbling
Muzio, Alexandre F., V
Maximo, Marcos R. O. A.
Yoneyama, Takashi
[J]. 2020 XVIII LATIN AMERICAN ROBOTICS SYMPOSIUM, 2020 XII BRAZILIAN SYMPOSIUM ON ROBOTICS AND 2020 XI WORKSHOP OF ROBOTICS IN EDUCATION (LARS-SBR-WRE 2020), 2020, : 246 - 251
[5] A Reinforcement Learning Method for Humanoid Robot Walking
Liu, Yunda
Bi, Sheng
Dong, Min
Zhang, Yingjie
Huang, Jialing
Zhang, Jiawei
[J]. 2018 IEEE 8TH ANNUAL INTERNATIONAL CONFERENCE ON CYBER TECHNOLOGY IN AUTOMATION, CONTROL, AND INTELLIGENT SYSTEMS (IEEE-CYBER), 2018, : 623 - 628
[6] Deep Reinforcement Learning for Humanoid Robot Behaviors
Alexandre F. V. Muzio
Marcos R. O. A. Maximo
Takashi Yoneyama
[J]. Journal of Intelligent & Robotic Systems, 2022, 105
[7] Deep Reinforcement Learning for Humanoid Robot Behaviors
Muzio, Alexandre F. V.
Maximo, Marcos R. O. A.
Yoneyama, Takashi
[J]. Journal of Intelligent and Robotic Systems: Theory and Applications, 2022, 105 (01):
[8] Deep Reinforcement Learning for a Humanoid Robot Soccer Player
Isaac Jesus da Silva
Danilo Hernani Perico
Thiago Pedro Donadon Homem
Reinaldo Augusto da Costa Bianchi
[J]. Journal of Intelligent & Robotic Systems, 2021, 102
[9] Deep Reinforcement Learning for a Humanoid Robot Soccer Player
da Silva, Isaac Jesus
Perico, Danilo Hernani
Donadon Homem, Thiago Pedro
da Costa Bianchi, Reinaldo Augusto
[J]. JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2021, 102 (03)
[10] Optimization of a Compact Model for the Compliant Humanoid Robot COMAN Using Reinforcement Learning
Colasanto, Luca
Kormushev, Petar
Tsagarakis, Nikolaos
Caldwell, Darwin G.
[J]. CYBERNETICS AND INFORMATION TECHNOLOGIES, 2012, 12 (03) : 76 - 85

← 1 2 3 4 5 →