Sample and Feedback Efficient Hierarchical Reinforcement Learning from Human Preferences

被引：0

作者：

Pinsler, Robert ^{[1
]}

Akrour, Riad ^{[2
]}

Osa, Takayuki ^{[3
,4
]}

Peters, Jan ^{[2
]}

Neumann, Gerhard ^{[2
,5
]}

机构：

[1] Univ Cambridge, Engn Dept, Cambridge, England

[2] Tech Univ Darmstadt, Fachbereich Informat, Darmstadt, Germany

[3] Univ Tokyo, Dept Complex Sci & Engn, Bunkyo Ku, Tokyo, Japan

[4] RIKEN, Ctr AIP, Chuo Ku, Tokyo, Japan

[5] Univ Lincoln, Sch Comp Sci, Lincoln, England

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA) | 2018年

基金：

欧盟地平线“2020”; 英国工程与自然科学研究理事会;

关键词：

LOOP;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

While reinforcement learning has led to promising results in robotics, defining an informative reward function is challenging. Prior work considered including the human in the loop to jointly learn the reward function and the optimal policy. Generating samples from a physical robot and requesting human feedback are both taxing efforts for which efficiency is critical. We propose to learn reward functions from both the robot and the human perspectives to improve on both efficiency metrics. Learning a reward function from the human perspective increases feedback efficiency by assuming that humans rank trajectories according to a low-dimensional outcome space. Learning a reward function from the robot perspective circumvents the need for a dynamics model while retaining the sample efficiency of model-based approaches. We provide an algorithm that incorporates bi-perspective reward learning into a general hierarchical reinforcement learning framework and demonstrate the merits of our approach on a toy task and a simulated robot grasping task.

引用

下载

页码：596 / 601

页数：6

共 50 条

[1] Sequential Preference Ranking for Efficient Reinforcement Learning from Human Feedback
Hwang, Minyoung
Lee, Gunmin
Kee, Hogun
Kim, Chanwoo
Lee, Kyungjae
Oh, Songhwai
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[2] Deep Reinforcement Learning from Human Preferences
Christiano, Paul F.
Leike, Jan
Brown, Tom B.
Martic, Miljan
Legg, Shane
Amodei, Dario
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[3] Hierarchical learning from human preferences and curiosity
Bougie, Nicolas
Ichise, Ryutaro
APPLIED INTELLIGENCE, 2022, 52 (07) : 7459 - 7479
[4] Hierarchical learning from human preferences and curiosity
Nicolas Bougie
Ryutaro Ichise
Applied Intelligence, 2022, 52 : 7459 - 7479
[5] Extensive and efficient search of human movements with hierarchical reinforcement learning
Mukai, T
Kuriyama, S
Kaneko, T
CA 2002: PROCEEDINGS OF THE COMPUTER ANIMATION 2002, 2002, : 103 - 107
[6] Human Social Feedback for Efficient Interactive Reinforcement Agent Learning
Lin, Jinying
Zhang, Qilei
Gomez, Randy
Nakamura, Keisuke
He, Bo
Li, Guangliang
2020 29TH IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (RO-MAN), 2020, : 706 - 712
[7] Root Cause Analysis for Microservice Systems via Hierarchical Reinforcement Learning from Human Feedback
Wang, Lu
Zhang, Chaoyun
Ding, Ruomeng
Xu, Yong
Chen, Qihang
Zou, Wentao
Chen, Qingjun
Zhang, Meng
Gao, Xuedong
Fan, Hao
Rajmohan, Saravan
Lin, Qingwei
Zhang, Dongmei
PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 5116 - 5125
[8] Sample Efficient Reinforcement Learning through Learning from Demonstrations in Minecraft
Scheller, Christian
Schraner, Yanick
Vogel, Manfred
NEURIPS 2019 COMPETITION AND DEMONSTRATION TRACK, VOL 123, 2019, 123 : 67 - 76
[9] Towards Sample Efficient Reinforcement Learning
Yu, Yang
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 5739 - 5743
[10] Sample Efficient Reinforcement Learning with REINFORCE
Zhang, Junzi
Kim, Jongho
O'Donoghue, Brendan
Boyd, Stephen
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 10887 - 10895

← 1 2 3 4 5 →