Robust Reinforcement Learning with Bayesian Optimisation and Quadrature

被引：0

作者：

Paul, Supratik ^{[1
]}

Chatzilygeroudis, Konstantinos ^{[2
,3
,4
]}

Ciosek, Kamil ^{[1
]}

Mouret, Jean-Baptiste ^{[2
,3
,4
]}

Osborne, Michael A. ^{[5
]}

Whiteson, Shimon ^{[1
]}

机构：

[1] Univ Oxford, Dept Comp Sci, Wolfson Bldg,Parks Rd, Oxford OX1 3QD, England

[2] INRIA, Paris, France

[3] Univ Lorraine, Nancy, France

[4] CNRS, Paris, France

[5] Univ Oxford, Dept Engn Sci, Oxford, England

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2020年 / 21卷

基金：

欧洲研究理事会;

关键词：

Reinforcement Learning; Bayesian Optimisation; Bayesian Quadrature; Significant rare events; Environment variables;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Bayesian optimisation has been successfully applied to a variety of reinforcement learning problems. However, the traditional approach for learning optimal policies in simulators does not utilise the opportunity to improve learning by adjusting certain environment variables: state features that are unobservable and randomly determined by the environment in a physical setting but are controllable in a simulator. This article considers the problem of finding a robust policy while taking into account the impact of environment variables. We present alternating optimisation and quadrature (ALOQ), which uses Bayesian optimisation and Bayesian quadrature to address such settings. We also present transferable ALOQ (TALOQ), for settings where simulator inaccuracies lead to difficulty in transferring the learnt policy to the physical system. We show that our algorithms are robust to the presence of significant rare events, which may not be observable under random sampling but play a substantial role in determining the optimal policy. Experimental results across different domains show that our algorithms learn robust policies efficiently.

引用

页数：31

共 50 条

[31] TRAINABLE, BAYESIAN SYMMETRIES FOR REINFORCEMENT LEARNING
Lu, Qingmei
PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER THEORY AND ENGINEERING (ICACTE 2009), VOLS 1 AND 2, 2009, : 1079 - 1086
[32] Cover tree bayesian reinforcement learning
Tziortziotis, Nikolaos
Dimitrakakis, Christos
Blekas, Konstantinos
Journal of Machine Learning Research, 2014, 15 : 2313 - 2335
[33] Cover Tree Bayesian Reinforcement Learning
Tziortziotis, Nikolaos
Dimitrakakis, Christos
Blekas, Konstantinos
JOURNAL OF MACHINE LEARNING RESEARCH, 2014, 15 : 2313 - 2335
[34] Sellers' Pricing By Bayesian Reinforcement Learning
Han, Wei
2009 INTERNATIONAL CONFERENCE ON E-BUSINESS AND INFORMATION SYSTEM SECURITY, VOLS 1 AND 2, 2009, : 1276 - 1280
[35] Hierarchical Bayesian Inverse Reinforcement Learning
Choi, Jaedeug
Kim, Kee-Eung
IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (04) : 793 - 805
[36] Bayesian reinforcement learning reliability analysis
Zhou, Tong
Guo, Tong
Dang, Chao
Beer, Michael
COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2024, 424
[37] Bayesian reinforcement learning reliability analysis
Zhou, Tong
Guo, Tong
Dang, Chao
Beer, Michael
Computer Methods in Applied Mechanics and Engineering, 2024, 424
[38] Multi-Objective Optimisation by Reinforcement Learning
Liao, H. L.
Wu, Q. H.
2010 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2010,
[39] Reinforcement learning for process identification, control and optimisation
Govindhasamy, JJ
McLoone, SF
Irwin, GW
2004 2ND INTERNATIONAL IEEE CONFERENCE INTELLIGENT SYSTEMS, VOLS 1 AND 2, PROCEEDINGS, 2004, : 316 - 321
[40] Robust Active Simultaneous Localization and Mapping Based on Bayesian Actor-Critic Reinforcement Learning
Pedraza, Bryan
Dera, Dimah
2023 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI, 2023, : 63 - 66

← 1 2 3 4 5 →