Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution

被引:0
|
作者
Patil, Vihang [1 ,2 ]
Hofmarcher, Markus [1 ,2 ]
Dinu, Marius-Constantin [1 ,2 ,3 ]
Dorfer, Matthias [4 ]
Blies, Patrick [4 ]
Brandstetter, Johannes [1 ,2 ,5 ]
Arjona-Medina, Jose [1 ,2 ,3 ]
Hochreiter, Sepp [1 ,2 ,6 ]
机构
[1] Johannes Kepler Univ Linz, Inst Machine Learning, ELLIS Unit Linz, Linz, Austria
[2] Johannes Kepler Univ Linz, Inst Machine Learning, LIT AI Lab, Linz, Austria
[3] Dynatrace Res, Linz, Austria
[4] EnliteAI, Vienna, Austria
[5] Microsoft Res, Redmond, WA USA
[6] Inst Adv Res Artificial Intelligence, Vienna, Austria
基金
欧盟地平线“2020”;
关键词
MULTIPLE SEQUENCE ALIGNMENT; NEURAL-NETWORKS; ALGORITHM; SEARCH;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning algorithms require many samples when solving complex hierarchical tasks with sparse and delayed rewards. For such complex tasks, the recently proposed RUDDER uses reward redistribution to leverage steps in the Q-function that are associated with accomplishing sub-tasks. However, often only few episodes with high rewards are available as demonstrations since current exploration strategies cannot discover them in reasonable time. In this work, we introduce Align-RUDDER, which utilizes a profile model for reward redistribution that is obtained from multiple sequence alignment of demonstrations. Consequently, Align-RUDDER employs reward redistribution effectively and, thereby, drastically improves learning on few demonstrations. Align-RUDDER outperforms competitors on complex artificial tasks with delayed rewards and few demonstrations. On the Minecraft ObtainDiamond task, Align-RUDDER is able to mine a diamond, though not frequently. Code is available at github.com/ml-jku/align-rudder.
引用
收藏
页数:42
相关论文
共 50 条
  • [41] Objective learning from human demonstrations
    Lin, Jonathan Feng-Shun
    Carreno-Medrano, Pamela
    Parsapour, Mahsa
    Sakr, Maram
    Kulic, Dana
    ANNUAL REVIEWS IN CONTROL, 2021, 51 : 111 - 129
  • [42] V-MIN: Efficient Reinforcement Learning through Demonstrations and Relaxed Reward Demands
    Martinez, David
    Alenya, Guillem
    Torras, Carme
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 2857 - 2863
  • [43] Human2bot: learning zero-shot reward functions for robotic manipulation from human demonstrations
    Yasir Salam
    Yinbei Li
    Jonas Herzog
    Jiaqiang Yang
    Autonomous Robots, 2025, 49 (2)
  • [44] Cross attention redistribution with contrastive learning for few shot object detection
    Quan, Jianing
    Ge, Baozhen
    Chen, Lei
    DISPLAYS, 2022, 72
  • [45] Bavesian inverse reinforcement learning for demonstrations of an expert in multiple dynamics: Toward estimation of transferable reward
    Yusukc N.
    Sachiyo A.
    Transactions of the Japanese Society for Artificial Intelligence, 2020, 35 (01)
  • [46] Adversarial Imitation Learning from Incomplete Demonstrations
    Sun, Mingfei
    Xiaojuan
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3513 - 3519
  • [47] Deep Q-Learning from Demonstrations
    Hester, Todd
    Vecerik, Matej
    Pietquin, Olivier
    Lanctot, Marc
    Schaul, Tom
    Piot, Bilal
    Horgan, Dan
    Quan, John
    Sendonaris, Andrew
    Osband, Ian
    Dulac-Arnold, Gabriel
    Agapiou, John
    Leibo, Joel Z.
    Gruslys, Audrunas
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3223 - 3230
  • [48] Robust Imitation Learning from Noisy Demonstrations
    Tangkaratt, Voot
    Charoenphakdee, Nontawat
    Sugiyama, Masashi
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130 : 298 - +
  • [49] Learning Dialog Policies from Weak Demonstrations
    Gordon-Hall, Gabriel
    Gorinski, Philip John
    Cohen, Shay B.
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 1394 - 1405
  • [50] Learning Periodic Tasks from Human Demonstrations
    Yang, Jingyun
    Zhang, Junwu
    Settle, Connor
    Rai, Akshara
    Antonova, Rika
    Bohg, Jeannette
    2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022, 2022, : 8658 - 8665