Reward learning from human preferences and demonstrations in Atari

被引：0

作者：

Ibarz, Borja ^{[1
]}

Leike, Jan ^{[1
]}

Pohlen, Tobias ^{[1
]}

Irving, Geoffrey ^{[2
]}

Legg, Shane ^{[1
]}

Amodei, Dario ^{[2
]}

机构：

[1] DeepMind, London, England

[2] OpenAI, San Francisco, CA USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018) | 2018年 / 31卷

关键词：

NEURAL-NETWORKS; DEEP;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

To solve complex real-world problems with reinforcement learning, we cannot rely on manually specified reward functions. Instead, we can have humans communicate an objective to the agent directly. In this work, we combine two approaches to learning from human feedback: expert demonstrations and trajectory preferences. We train a deep neural network to model the reward function and use its predicted reward to train an DQN-based deep reinforcement learning agent on 9 Atari games. Our approach beats the imitation learning baseline in 7 games and achieves strictly superhuman performance on 2 games without using game rewards. Additionally, we investigate the goodness of fit of the reward model, present some reward hacking problems, and study the effects of noise in the human labels.

引用

页数：13

共 50 条

[1] Learning Reward Functions by Integrating Human Demonstrations and Preferences
Palan, Malayandi
Shevchuk, Gleb
Landolfi, Nicholas C.
Sadigh, Dorsa
[J]. ROBOTICS: SCIENCE AND SYSTEMS XV, 2019,
[2] Joint Estimation of Expertise and Reward Preferences From Human Demonstrations
Carreno-Medrano, Pamela
Smith, Stephen L.
Kulic, Dana
[J]. IEEE TRANSACTIONS ON ROBOTICS, 2023, 39 (01) : 681 - 698
[3] Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences
Biyik, Erdem
Losey, Dylan P.
Palan, Malayandi
Landolfi, Nicholas C.
Shevchuk, Gleb
Sadigh, Dorsa
[J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2022, 41 (01): : 45 - 67
[4] Reward Learning from Narrated Demonstrations
Tung, Hsiao-Yu
Harley, Adam W.
Huang, Liang-Kang
Fragkiadaki, Katerina
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7004 - 7013
[5] Reward Learning From Very Few Demonstrations
Eteke, Cem
Kebude, Dogancan
Akgun, Baris
[J]. IEEE TRANSACTIONS ON ROBOTICS, 2021, 37 (03) : 893 - 904
[6] Model-based Adversarial Imitation Learning from Demonstrations and Human Reward
Huang, Jie
Hao, Jiangshan
Juan, Rongshun
Gomez, Randy
Nakamura, Keisuke
Li, Guangliang
[J]. 2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS, 2023, : 1683 - 1690
[7] Batch Active Learning of Reward Functions from Human Preferences
Biyik, Erdem
Anari, Nima
Sadigh, Dorsa
[J]. ACM TRANSACTIONS ON HUMAN-ROBOT INTERACTION, 2024, 13 (02)
[8] Reward Learning from Suboptimal Demonstrations with Applications in Surgical Electrocautery
Karimi, Zohre
Ho, Shing-Hei
Thach, Bao
Kuntz, Alan
Brown, Daniel S.
[J]. 2024 INTERNATIONAL SYMPOSIUM ON MEDICAL ROBOTICS, ISMR 2024, 2024,
[9] Active Reward Learning from Online Preferences
Myers, Vivek
Biyik, Erdem
Sadigh, Dorsa
[J]. 2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 7511 - 7518
[10] Objective learning from human demonstrations
Lin, Jonathan Feng-Shun
Carreno-Medrano, Pamela
Parsapour, Mahsa
Sakr, Maram
Kulic, Dana
[J]. ANNUAL REVIEWS IN CONTROL, 2021, 51 : 111 - 129

← 1 2 3 4 5 →