Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution

被引:0
|
作者
Patil, Vihang [1 ,2 ]
Hofmarcher, Markus [1 ,2 ]
Dinu, Marius-Constantin [1 ,2 ,3 ]
Dorfer, Matthias [4 ]
Blies, Patrick [4 ]
Brandstetter, Johannes [1 ,2 ,5 ]
Arjona-Medina, Jose [1 ,2 ,3 ]
Hochreiter, Sepp [1 ,2 ,6 ]
机构
[1] Johannes Kepler Univ Linz, Inst Machine Learning, ELLIS Unit Linz, Linz, Austria
[2] Johannes Kepler Univ Linz, Inst Machine Learning, LIT AI Lab, Linz, Austria
[3] Dynatrace Res, Linz, Austria
[4] EnliteAI, Vienna, Austria
[5] Microsoft Res, Redmond, WA USA
[6] Inst Adv Res Artificial Intelligence, Vienna, Austria
基金
欧盟地平线“2020”;
关键词
MULTIPLE SEQUENCE ALIGNMENT; NEURAL-NETWORKS; ALGORITHM; SEARCH;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning algorithms require many samples when solving complex hierarchical tasks with sparse and delayed rewards. For such complex tasks, the recently proposed RUDDER uses reward redistribution to leverage steps in the Q-function that are associated with accomplishing sub-tasks. However, often only few episodes with high rewards are available as demonstrations since current exploration strategies cannot discover them in reasonable time. In this work, we introduce Align-RUDDER, which utilizes a profile model for reward redistribution that is obtained from multiple sequence alignment of demonstrations. Consequently, Align-RUDDER employs reward redistribution effectively and, thereby, drastically improves learning on few demonstrations. Align-RUDDER outperforms competitors on complex artificial tasks with delayed rewards and few demonstrations. On the Minecraft ObtainDiamond task, Align-RUDDER is able to mine a diamond, though not frequently. Code is available at github.com/ml-jku/align-rudder.
引用
收藏
页数:42
相关论文
共 50 条
  • [1] Reward Learning From Very Few Demonstrations
    Eteke, Cem
    Kebude, Dogancan
    Akgun, Baris
    IEEE TRANSACTIONS ON ROBOTICS, 2021, 37 (03) : 893 - 904
  • [2] Reward Learning from Narrated Demonstrations
    Tung, Hsiao-Yu
    Harley, Adam W.
    Huang, Liang-Kang
    Fragkiadaki, Katerina
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7004 - 7013
  • [3] Learning Manipulation Actions from a Few Demonstrations
    Abdo, Nichola
    Kretzschmar, Henrik
    Spinello, Luciano
    Stachniss, Cyrill
    2013 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2013, : 1268 - 1275
  • [4] Intelligent robotic sonographer: Mutual information-based disentangled reward learning from few demonstrations
    Jiang, Zhongliang
    Bi, Yuan
    Zhou, Mingchuan
    Hu, Ying
    Burke, Michael
    Navab, Nassir
    INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2024, 43 (07): : 981 - 1002
  • [5] Reward learning from human preferences and demonstrations in Atari
    Ibarz, Borja
    Leike, Jan
    Pohlen, Tobias
    Irving, Geoffrey
    Legg, Shane
    Amodei, Dario
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [6] Reward Learning from Suboptimal Demonstrations with Applications in Surgical Electrocautery
    Karimi, Zohre
    Ho, Shing-Hei
    Thach, Bao
    Kuntz, Alan
    Brown, Daniel S.
    2024 INTERNATIONAL SYMPOSIUM ON MEDICAL ROBOTICS, ISMR 2024, 2024,
  • [7] Reinforcement learning from suboptimal demonstrations based on Reward Relabeling
    Peng, Yong
    Zeng, Junjie
    Hu, Yue
    Fang, Qi
    Yin, Quanjun
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
  • [8] Learning Task-Parameterized Skills From Few Demonstrations
    Zhu, Jihong
    Gienger, Michael
    Kober, Jens
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (02) : 4063 - 4070
  • [9] Learning prohibited and authorised grasping locations from a few demonstrations
    Helenon, Francois
    Bimont, Laurent
    Nyiri, Eric
    Thiery, Stephane
    Gibaru, Olivier
    2020 29TH IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (RO-MAN), 2020, : 1094 - 1100
  • [10] Reinforcement learning with Demonstrations from Mismatched Task under Sparse Reward
    Guo, Yanjiang
    Gao, Jingyue
    Wu, Zheng
    Shi, Chengming
    Chen, Jianyu
    CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 1146 - 1156