Sample and Feedback Efficient Hierarchical Reinforcement Learning from Human Preferences

被引:0
|
作者
Pinsler, Robert [1 ]
Akrour, Riad [2 ]
Osa, Takayuki [3 ,4 ]
Peters, Jan [2 ]
Neumann, Gerhard [2 ,5 ]
机构
[1] Univ Cambridge, Engn Dept, Cambridge, England
[2] Tech Univ Darmstadt, Fachbereich Informat, Darmstadt, Germany
[3] Univ Tokyo, Dept Complex Sci & Engn, Bunkyo Ku, Tokyo, Japan
[4] RIKEN, Ctr AIP, Chuo Ku, Tokyo, Japan
[5] Univ Lincoln, Sch Comp Sci, Lincoln, England
基金
欧盟地平线“2020”; 英国工程与自然科学研究理事会;
关键词
LOOP;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
While reinforcement learning has led to promising results in robotics, defining an informative reward function is challenging. Prior work considered including the human in the loop to jointly learn the reward function and the optimal policy. Generating samples from a physical robot and requesting human feedback are both taxing efforts for which efficiency is critical. We propose to learn reward functions from both the robot and the human perspectives to improve on both efficiency metrics. Learning a reward function from the human perspective increases feedback efficiency by assuming that humans rank trajectories according to a low-dimensional outcome space. Learning a reward function from the robot perspective circumvents the need for a dynamics model while retaining the sample efficiency of model-based approaches. We provide an algorithm that incorporates bi-perspective reward learning into a general hierarchical reinforcement learning framework and demonstrate the merits of our approach on a toy task and a simulated robot grasping task.
引用
下载
收藏
页码:596 / 601
页数:6
相关论文
共 50 条
  • [1] Sequential Preference Ranking for Efficient Reinforcement Learning from Human Feedback
    Hwang, Minyoung
    Lee, Gunmin
    Kee, Hogun
    Kim, Chanwoo
    Lee, Kyungjae
    Oh, Songhwai
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [2] Deep Reinforcement Learning from Human Preferences
    Christiano, Paul F.
    Leike, Jan
    Brown, Tom B.
    Martic, Miljan
    Legg, Shane
    Amodei, Dario
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [3] Hierarchical learning from human preferences and curiosity
    Bougie, Nicolas
    Ichise, Ryutaro
    APPLIED INTELLIGENCE, 2022, 52 (07) : 7459 - 7479
  • [4] Hierarchical learning from human preferences and curiosity
    Nicolas Bougie
    Ryutaro Ichise
    Applied Intelligence, 2022, 52 : 7459 - 7479
  • [5] Extensive and efficient search of human movements with hierarchical reinforcement learning
    Mukai, T
    Kuriyama, S
    Kaneko, T
    CA 2002: PROCEEDINGS OF THE COMPUTER ANIMATION 2002, 2002, : 103 - 107
  • [6] Human Social Feedback for Efficient Interactive Reinforcement Agent Learning
    Lin, Jinying
    Zhang, Qilei
    Gomez, Randy
    Nakamura, Keisuke
    He, Bo
    Li, Guangliang
    2020 29TH IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (RO-MAN), 2020, : 706 - 712
  • [7] Root Cause Analysis for Microservice Systems via Hierarchical Reinforcement Learning from Human Feedback
    Wang, Lu
    Zhang, Chaoyun
    Ding, Ruomeng
    Xu, Yong
    Chen, Qihang
    Zou, Wentao
    Chen, Qingjun
    Zhang, Meng
    Gao, Xuedong
    Fan, Hao
    Rajmohan, Saravan
    Lin, Qingwei
    Zhang, Dongmei
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 5116 - 5125
  • [8] Sample Efficient Reinforcement Learning through Learning from Demonstrations in Minecraft
    Scheller, Christian
    Schraner, Yanick
    Vogel, Manfred
    NEURIPS 2019 COMPETITION AND DEMONSTRATION TRACK, VOL 123, 2019, 123 : 67 - 76
  • [9] Towards Sample Efficient Reinforcement Learning
    Yu, Yang
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 5739 - 5743
  • [10] Sample Efficient Reinforcement Learning with REINFORCE
    Zhang, Junzi
    Kim, Jongho
    O'Donoghue, Brendan
    Boyd, Stephen
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 10887 - 10895