Reward learning from human preferences and demonstrations in Atari

被引:0
|
作者
Ibarz, Borja [1 ]
Leike, Jan [1 ]
Pohlen, Tobias [1 ]
Irving, Geoffrey [2 ]
Legg, Shane [1 ]
Amodei, Dario [2 ]
机构
[1] DeepMind, London, England
[2] OpenAI, San Francisco, CA USA
关键词
NEURAL-NETWORKS; DEEP;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
To solve complex real-world problems with reinforcement learning, we cannot rely on manually specified reward functions. Instead, we can have humans communicate an objective to the agent directly. In this work, we combine two approaches to learning from human feedback: expert demonstrations and trajectory preferences. We train a deep neural network to model the reward function and use its predicted reward to train an DQN-based deep reinforcement learning agent on 9 Atari games. Our approach beats the imitation learning baseline in 7 games and achieves strictly superhuman performance on 2 games without using game rewards. Additionally, we investigate the goodness of fit of the reward model, present some reward hacking problems, and study the effects of noise in the human labels.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Learning Reward Functions by Integrating Human Demonstrations and Preferences
    Palan, Malayandi
    Shevchuk, Gleb
    Landolfi, Nicholas C.
    Sadigh, Dorsa
    [J]. ROBOTICS: SCIENCE AND SYSTEMS XV, 2019,
  • [2] Joint Estimation of Expertise and Reward Preferences From Human Demonstrations
    Carreno-Medrano, Pamela
    Smith, Stephen L.
    Kulic, Dana
    [J]. IEEE TRANSACTIONS ON ROBOTICS, 2023, 39 (01) : 681 - 698
  • [3] Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences
    Biyik, Erdem
    Losey, Dylan P.
    Palan, Malayandi
    Landolfi, Nicholas C.
    Shevchuk, Gleb
    Sadigh, Dorsa
    [J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2022, 41 (01): : 45 - 67
  • [4] Reward Learning from Narrated Demonstrations
    Tung, Hsiao-Yu
    Harley, Adam W.
    Huang, Liang-Kang
    Fragkiadaki, Katerina
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7004 - 7013
  • [5] Reward Learning From Very Few Demonstrations
    Eteke, Cem
    Kebude, Dogancan
    Akgun, Baris
    [J]. IEEE TRANSACTIONS ON ROBOTICS, 2021, 37 (03) : 893 - 904
  • [6] Model-based Adversarial Imitation Learning from Demonstrations and Human Reward
    Huang, Jie
    Hao, Jiangshan
    Juan, Rongshun
    Gomez, Randy
    Nakamura, Keisuke
    Li, Guangliang
    [J]. 2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS, 2023, : 1683 - 1690
  • [7] Batch Active Learning of Reward Functions from Human Preferences
    Biyik, Erdem
    Anari, Nima
    Sadigh, Dorsa
    [J]. ACM TRANSACTIONS ON HUMAN-ROBOT INTERACTION, 2024, 13 (02)
  • [8] Reward Learning from Suboptimal Demonstrations with Applications in Surgical Electrocautery
    Karimi, Zohre
    Ho, Shing-Hei
    Thach, Bao
    Kuntz, Alan
    Brown, Daniel S.
    [J]. 2024 INTERNATIONAL SYMPOSIUM ON MEDICAL ROBOTICS, ISMR 2024, 2024,
  • [9] Active Reward Learning from Online Preferences
    Myers, Vivek
    Biyik, Erdem
    Sadigh, Dorsa
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 7511 - 7518
  • [10] Objective learning from human demonstrations
    Lin, Jonathan Feng-Shun
    Carreno-Medrano, Pamela
    Parsapour, Mahsa
    Sakr, Maram
    Kulic, Dana
    [J]. ANNUAL REVIEWS IN CONTROL, 2021, 51 : 111 - 129