Robust Reinforcement Learning with Bayesian Optimisation and Quadrature

被引：0

作者：

Paul, Supratik ^{[1
]}

Chatzilygeroudis, Konstantinos ^{[2
,3
,4
]}

Ciosek, Kamil ^{[1
]}

Mouret, Jean-Baptiste ^{[2
,3
,4
]}

Osborne, Michael A. ^{[5
]}

Whiteson, Shimon ^{[1
]}

机构：

[1] Univ Oxford, Dept Comp Sci, Wolfson Bldg,Parks Rd, Oxford OX1 3QD, England

[2] INRIA, Paris, France

[3] Univ Lorraine, Nancy, France

[4] CNRS, Paris, France

[5] Univ Oxford, Dept Engn Sci, Oxford, England

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2020年 / 21卷

基金：

欧洲研究理事会;

关键词：

Reinforcement Learning; Bayesian Optimisation; Bayesian Quadrature; Significant rare events; Environment variables;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Bayesian optimisation has been successfully applied to a variety of reinforcement learning problems. However, the traditional approach for learning optimal policies in simulators does not utilise the opportunity to improve learning by adjusting certain environment variables: state features that are unobservable and randomly determined by the environment in a physical setting but are controllable in a simulator. This article considers the problem of finding a robust policy while taking into account the impact of environment variables. We present alternating optimisation and quadrature (ALOQ), which uses Bayesian optimisation and Bayesian quadrature to address such settings. We also present transferable ALOQ (TALOQ), for settings where simulator inaccuracies lead to difficulty in transferring the learnt policy to the physical system. We show that our algorithms are robust to the presence of significant rare events, which may not be observable under random sampling but play a substantial role in determining the optimal policy. Experimental results across different domains show that our algorithms learn robust policies efficiently.

引用

页数：31

共 50 条

[1] Robust reinforcement learning with bayesian optimisation and quadrature
Paul, Supratik
Chatzilygeroudis, Konstantinos
Ciosek, Kamil
Mouret, Jean-Baptiste
Osborne, Michael A.
Whiteson, Shimon
Journal of Machine Learning Research, 2020, 21
[2] A Bayesian Approach to Robust Reinforcement Learning
Derman, Esther
Mankowitz, Daniel
Mann, Timothy
Mannor, Shie
35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), 2020, 115 : 648 - 658
[3] Fingerprint Policy Optimisation for Robust Reinforcement Learning
Paul, Supratik
Osborne, Michael A.
Whiteson, Shimon
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[4] A Bayesian Approach to Robust Inverse Reinforcement Learning
Wei, Ran
Zeng, Siliang
Li, Chenliang
Garcia, Alfredo
McDonald, Anthony
Hong, Mingyi
CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
[5] Adversarial Proximal Policy Optimisation for Robust Reinforcement Learning
Ince, Bilkan
Shin, Hyo-Sang
Tsourdos, Antonios
AIAA SCITECH 2024 FORUM, 2024,
[6] Alternating Optimisation and Quadrature for Robust Control
Paul, Supratik
Chatzilygeroudis, Konstantinos
Ciosek, Kamil
Mouret, Jean-Baptiste
Osborne, Michael A.
Whiteson, Shimon
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3925 - 3933
[7] Distributionally Robust Bayesian Quadrature Optimization
Thanh Tang Nguyen
Gupta, Sunil
Ha, Huong
Rana, Santu
Venkatesh, Svetha
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 1921 - 1930
[8] Robust Bayesian Inverse Reinforcement Learning with Sparse Behavior Noise
Zheng, Jiangchuan
Liu, Siyuan
Ni, Lionel M.
PROCEEDINGS OF THE TWENTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2014, : 2198 - 2205
[9] Robust and Explorative Behavior in Model-based Bayesian Reinforcement Learning
Hishinuma, Toru
Senda, Kei
PROCEEDINGS OF 2016 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2016,
[10] Reinforcement learning and stochastic optimisation
Sebastian Jaimungal
Finance and Stochastics, 2022, 26 : 103 - 129

← 1 2 3 4 5 →