Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution

被引：0

作者：

Patil, Vihang ^{[1
,2
]}

Hofmarcher, Markus ^{[1
,2
]}

Dinu, Marius-Constantin ^{[1
,2
,3
]}

Dorfer, Matthias ^{[4
]}

Blies, Patrick ^{[4
]}

Brandstetter, Johannes ^{[1
,2
,5
]}

Arjona-Medina, Jose ^{[1
,2
,3
]}

Hochreiter, Sepp ^{[1
,2
,6
]}

机构：

[1] Johannes Kepler Univ Linz, Inst Machine Learning, ELLIS Unit Linz, Linz, Austria

[2] Johannes Kepler Univ Linz, Inst Machine Learning, LIT AI Lab, Linz, Austria

[3] Dynatrace Res, Linz, Austria

[4] EnliteAI, Vienna, Austria

[5] Microsoft Res, Redmond, WA USA

[6] Inst Adv Res Artificial Intelligence, Vienna, Austria

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162 | 2022年

基金：

欧盟地平线“2020”;

关键词：

MULTIPLE SEQUENCE ALIGNMENT; NEURAL-NETWORKS; ALGORITHM; SEARCH;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement learning algorithms require many samples when solving complex hierarchical tasks with sparse and delayed rewards. For such complex tasks, the recently proposed RUDDER uses reward redistribution to leverage steps in the Q-function that are associated with accomplishing sub-tasks. However, often only few episodes with high rewards are available as demonstrations since current exploration strategies cannot discover them in reasonable time. In this work, we introduce Align-RUDDER, which utilizes a profile model for reward redistribution that is obtained from multiple sequence alignment of demonstrations. Consequently, Align-RUDDER employs reward redistribution effectively and, thereby, drastically improves learning on few demonstrations. Align-RUDDER outperforms competitors on complex artificial tasks with delayed rewards and few demonstrations. On the Minecraft ObtainDiamond task, Align-RUDDER is able to mine a diamond, though not frequently. Code is available at github.com/ml-jku/align-rudder.

引用

页数：42

共 50 条

[21] Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences
Biyik, Erdem
Losey, Dylan P.
Palan, Malayandi
Landolfi, Nicholas C.
Shevchuk, Gleb
Sadigh, Dorsa
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2022, 41 (01): : 45 - 67
[22] Interpretable Reward Redistribution in Reinforcement Learning: A Causal Approach
Zhang, Yudi
Du, Yali
Huang, Biwei
Wang, Ziyan
Wang, Jun
Fang, Meng
Pechenizkiy, Mykola
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[23] Reward Redistribution for Reinforcement Learning of Dynamic Nonprehensile Manipulation
Sejnova, Gabriela
Mejdrechova, Megi
Otahal, Marek
Sokovnin, Nikita
Farkas, Igor
Vavrecka, Michal
2021 7TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND ROBOTICS (ICCAR), 2021, : 326 - 331
[24] Sparse Reward based Manipulator Motion Planning by Using High Speed Learning from Demonstrations
Zuo, Guoyu
Lu, Jiahao
Pan, Tingting
2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO), 2018, : 518 - 523
[25] Enhanced Meta Reinforcement Learning using Demonstrations in Sparse Reward Environments
Rengarajan, Desik
Chaudhary, Sapana
Kim, Jaewon
Kalathil, Dileep
Shakkottai, Srinivas
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[26] Learning From Sparse Demonstrations
Jin, Wanxin
Murphey, Todd D.
Kulic, Dana
Ezer, Neta
Mou, Shaoshuai
IEEE TRANSACTIONS ON ROBOTICS, 2023, 39 (01) : 645 - 664
[27] Learning to Generalize from Demonstrations
Browne, Katie
Nicolescu, Monica
CYBERNETICS AND INFORMATION TECHNOLOGIES, 2012, 12 (03) : 27 - 38
[28] Learning from Corrective Demonstrations
Gutierrez, Reymundo A.
Short, Elaine Schaertl
Niekum, Scott
Thomaz, Andrea L.
HRI '19: 2019 14TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2019, : 712 - 714
[29] Joint Estimation of Expertise and Reward Preferences From Human Demonstrations
Carreno-Medrano, Pamela
Smith, Stephen L.
Kulic, Dana
IEEE TRANSACTIONS ON ROBOTICS, 2023, 39 (01) : 681 - 698
[30] Learning and generalization of task-parameterized skills through few human demonstrations
Prados, Adrian
Garrido, Santiago
Barber, Ramon
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133

← 1 2 3 4 5 →