Sample and Feedback Efficient Hierarchical Reinforcement Learning from Human Preferences

被引:0
|
作者
Pinsler, Robert [1 ]
Akrour, Riad [2 ]
Osa, Takayuki [3 ,4 ]
Peters, Jan [2 ]
Neumann, Gerhard [2 ,5 ]
机构
[1] Univ Cambridge, Engn Dept, Cambridge, England
[2] Tech Univ Darmstadt, Fachbereich Informat, Darmstadt, Germany
[3] Univ Tokyo, Dept Complex Sci & Engn, Bunkyo Ku, Tokyo, Japan
[4] RIKEN, Ctr AIP, Chuo Ku, Tokyo, Japan
[5] Univ Lincoln, Sch Comp Sci, Lincoln, England
基金
欧盟地平线“2020”; 英国工程与自然科学研究理事会;
关键词
LOOP;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
While reinforcement learning has led to promising results in robotics, defining an informative reward function is challenging. Prior work considered including the human in the loop to jointly learn the reward function and the optimal policy. Generating samples from a physical robot and requesting human feedback are both taxing efforts for which efficiency is critical. We propose to learn reward functions from both the robot and the human perspectives to improve on both efficiency metrics. Learning a reward function from the human perspective increases feedback efficiency by assuming that humans rank trajectories according to a low-dimensional outcome space. Learning a reward function from the robot perspective circumvents the need for a dynamics model while retaining the sample efficiency of model-based approaches. We provide an algorithm that incorporates bi-perspective reward learning into a general hierarchical reinforcement learning framework and demonstrate the merits of our approach on a toy task and a simulated robot grasping task.
引用
收藏
页码:596 / 601
页数:6
相关论文
共 50 条
  • [21] Reinforcement Learning From Hierarchical Critics
    Cao, Zehong
    Lin, Chin-Teng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (02) : 1066 - 1073
  • [22] A Provably Efficient Sample Collection Strategy for Reinforcement Learning
    Tarbouriech, Jean
    Pirotta, Matteo
    Valko, Michal
    Lazaric, Alessandro
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [23] Sample Efficient Offline-to-Online Reinforcement Learning
    Guo, Siyuan
    Zou, Lixin
    Chen, Hechang
    Qu, Bohao
    Chi, Haotian
    Yu, Philip S.
    Chang, Yi
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (03) : 1299 - 1310
  • [24] Sample Efficient Reinforcement Learning for Navigation in Complex Environments
    Moridian, Barzin
    Page, Brian R.
    Mahmoudian, Nina
    2019 IEEE INTERNATIONAL SYMPOSIUM ON SAFETY, SECURITY, AND RESCUE ROBOTICS (SSRR), 2019, : 15 - 21
  • [25] Sample-Efficient Reinforcement Learning of Undercomplete POMDPs
    Jin, Chi
    Kakade, Sham M.
    Krishnamurthy, Akshay
    Liu, Qinghua
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [26] Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge
    Alharbi, Meshal
    Roozbehani, Mardavij
    Dahleh, Munther
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 10, 2024, : 10804 - 10811
  • [27] Learning to Interrupt: A Hierarchical Deep Reinforcement Learning Framework for Efficient Exploration
    Li, Tingguang
    Pan, Jin
    Zhu, Delong
    Meng, Max Q. -H.
    2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO), 2018, : 648 - 653
  • [28] Aligning Human Preferences with Baseline Objectives in Reinforcement Learning
    Marta, Daniel
    Holk, Simon
    Pek, Christian
    Tumova, Jana
    Leite, Iolanda
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 7562 - 7568
  • [29] A Sample Efficiency Improved Method via Hierarchical Reinforcement Learning Networks
    Chen, Qinghua
    Dallas, Evan
    Shahverdi, Pourya
    Korneder, Jessica
    Rawashdeh, Osamah A.
    Louie, Wing-Yue Geoffrey
    2022 31ST IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (IEEE RO-MAN 2022), 2022, : 1498 - 1505
  • [30] An Efficient Approach to Model-Based Hierarchical Reinforcement Learning
    Li, Zhuoru
    Narayan, Akshay
    Leong, Tze-Yun
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3583 - 3589