Active Preference-Based Gaussian Process Regression for Reward Learning

被引:0
|
作者
Biyik, Lirdem [1 ]
Huynh, Nicolas [2 ]
Kochenderfer, Mykel J. [3 ]
Sadigh, Dorsa [4 ]
机构
[1] Stanford Univ, Elect Engn, Stanford, CA 94305 USA
[2] Ecole Polytech, Appl Math, Palaiseau, France
[3] Stanford Univ, Aeronaut & Astronaut, Stanford, CA 94305 USA
[4] Stanford Univ, Comp Sci, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Designing reward functions is a challenging problem in AI and robotics. Humans usually have a difficult time directly specifying all the desirable behaviors that a robot needs to optimize. One common approach is to learn reward functions from collected expert demonstrations. However, learning reward functions from demonstrations introduces many challenges: some methods require highly structured models, e.g. reward functions that are linear in some predefined set of features, while others adopt less structured reward functions that on the other hand require tremendous amount of data. In addition, humans tend to have a difficult time providing demonstrations on robots with high degrees of freedom, or even quantifying reward values for given demonstrations. To address these challenges, we present a preference-based learning approach, where as an alternative, the human feedback is only in the form of comparisons between trajectories. Furthermore, we do not assume highly constrained structures on the reward function. Instead, we model the reward function using a Gaussian Process (GP) and propose a mathematical formulation to actively find a GP using only human preferences. Our approach enables us to tackle both inflexibility and data-inefficiency problems within a preference-based learning framework. Our results in simulations and a user study suggest that our approach can efficiently learn expressive reward functions for robotics tasks.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Active preference-based Gaussian process regression for reward learning and optimization
    Biyik, Erdem
    Huynh, Nicolas
    Kochenderfer, Mykel J.
    Sadigh, Dorsa
    [J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2024, 43 (05): : 665 - 684
  • [2] Active Preference-Based Learning of Reward Functions
    Sadigh, Dorsa
    Dragan, Anca D.
    Sastry, Shankar
    Seshia, Sanjit A.
    [J]. ROBOTICS: SCIENCE AND SYSTEMS XIII, 2017,
  • [3] APReL: A Library for Active Preference-based Reward Learning Algorithms
    Biyik, Erdem
    Talati, Aditi
    Sadigh, Dorsa
    [J]. PROCEEDINGS OF THE 2022 17TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI '22), 2022, : 613 - 617
  • [4] Inverse Preference Learning: Preference-based RL without a Reward Function
    Hejna, Joey
    Sadigh, Dorsa
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] Contextual Bandits and Imitation Learning with Preference-Based Active Queries
    Sekhari, Ayush
    Sridharan, Karthik
    Sun, Wen
    Wu, Runzhe
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [6] GAUSSIAN PROCESS REGRESSION WITHIN AN ACTIVE LEARNING SCHEME
    Pasolli, Edoardo
    Melgani, Farid
    [J]. 2011 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2011, : 3574 - 3577
  • [7] Preference-based learning to rank
    Nir Ailon
    Mehryar Mohri
    [J]. Machine Learning, 2010, 80 : 189 - 211
  • [8] Preference-Based Policy Learning
    Akrour, Riad
    Schoenauer, Marc
    Sebag, Michele
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT I, 2011, 6911 : 12 - 27
  • [9] Preference-based learning to rank
    Ailon, Nir
    Mohri, Mehryar
    [J]. MACHINE LEARNING, 2010, 80 (2-3) : 189 - 211
  • [10] Prediction of Reward Functions for Deep Reinforcement Learning via Gaussian Process Regression
    Lim, Jaehyun
    Ha, Seungchul
    Choi, Jongeun
    [J]. IEEE-ASME TRANSACTIONS ON MECHATRONICS, 2020, 25 (04) : 1739 - 1746