Active Preference-Based Gaussian Process Regression for Reward Learning

被引：0

作者：

Biyik, Lirdem ^{[1
]}

Huynh, Nicolas ^{[2
]}

Kochenderfer, Mykel J. ^{[3
]}

Sadigh, Dorsa ^{[4
]}

机构：

[1] Stanford Univ, Elect Engn, Stanford, CA 94305 USA

[2] Ecole Polytech, Appl Math, Palaiseau, France

[3] Stanford Univ, Aeronaut & Astronaut, Stanford, CA 94305 USA

[4] Stanford Univ, Comp Sci, Stanford, CA 94305 USA

来源：

ROBOTICS: SCIENCE AND SYSTEMS XVI | 2020年

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Designing reward functions is a challenging problem in AI and robotics. Humans usually have a difficult time directly specifying all the desirable behaviors that a robot needs to optimize. One common approach is to learn reward functions from collected expert demonstrations. However, learning reward functions from demonstrations introduces many challenges: some methods require highly structured models, e.g. reward functions that are linear in some predefined set of features, while others adopt less structured reward functions that on the other hand require tremendous amount of data. In addition, humans tend to have a difficult time providing demonstrations on robots with high degrees of freedom, or even quantifying reward values for given demonstrations. To address these challenges, we present a preference-based learning approach, where as an alternative, the human feedback is only in the form of comparisons between trajectories. Furthermore, we do not assume highly constrained structures on the reward function. Instead, we model the reward function using a Gaussian Process (GP) and propose a mathematical formulation to actively find a GP using only human preferences. Our approach enables us to tackle both inflexibility and data-inefficiency problems within a preference-based learning framework. Our results in simulations and a user study suggest that our approach can efficiently learn expressive reward functions for robotics tasks.

引用

页数：10

共 50 条

[1] Active preference-based Gaussian process regression for reward learning and optimization
Biyik, Erdem
Huynh, Nicolas
Kochenderfer, Mykel J.
Sadigh, Dorsa
[J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2024, 43 (05): : 665 - 684
[2] Active Preference-Based Learning of Reward Functions
Sadigh, Dorsa
Dragan, Anca D.
Sastry, Shankar
Seshia, Sanjit A.
[J]. ROBOTICS: SCIENCE AND SYSTEMS XIII, 2017,
[3] APReL: A Library for Active Preference-based Reward Learning Algorithms
Biyik, Erdem
Talati, Aditi
Sadigh, Dorsa
[J]. PROCEEDINGS OF THE 2022 17TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI '22), 2022, : 613 - 617
[4] Inverse Preference Learning: Preference-based RL without a Reward Function
Hejna, Joey
Sadigh, Dorsa
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[5] Contextual Bandits and Imitation Learning with Preference-Based Active Queries
Sekhari, Ayush
Sridharan, Karthik
Sun, Wen
Wu, Runzhe
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[6] GAUSSIAN PROCESS REGRESSION WITHIN AN ACTIVE LEARNING SCHEME
Pasolli, Edoardo
Melgani, Farid
[J]. 2011 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2011, : 3574 - 3577
[7] Preference-based learning to rank
Nir Ailon
Mehryar Mohri
[J]. Machine Learning, 2010, 80 : 189 - 211
[8] Preference-Based Policy Learning
Akrour, Riad
Schoenauer, Marc
Sebag, Michele
[J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT I, 2011, 6911 : 12 - 27
[9] Preference-based learning to rank
Ailon, Nir
Mohri, Mehryar
[J]. MACHINE LEARNING, 2010, 80 (2-3) : 189 - 211
[10] Prediction of Reward Functions for Deep Reinforcement Learning via Gaussian Process Regression
Lim, Jaehyun
Ha, Seungchul
Choi, Jongeun
[J]. IEEE-ASME TRANSACTIONS ON MECHATRONICS, 2020, 25 (04) : 1739 - 1746

← 1 2 3 4 5 →