Robust Reinforcement Learning with Bayesian Optimisation and Quadrature

被引：0

作者：

Paul, Supratik ^{[1
]}

Chatzilygeroudis, Konstantinos ^{[2
,3
,4
]}

Ciosek, Kamil ^{[1
]}

Mouret, Jean-Baptiste ^{[2
,3
,4
]}

Osborne, Michael A. ^{[5
]}

Whiteson, Shimon ^{[1
]}

机构：

[1] Univ Oxford, Dept Comp Sci, Wolfson Bldg,Parks Rd, Oxford OX1 3QD, England

[2] INRIA, Paris, France

[3] Univ Lorraine, Nancy, France

[4] CNRS, Paris, France

[5] Univ Oxford, Dept Engn Sci, Oxford, England

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2020年 / 21卷

基金：

欧洲研究理事会;

关键词：

Reinforcement Learning; Bayesian Optimisation; Bayesian Quadrature; Significant rare events; Environment variables;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Bayesian optimisation has been successfully applied to a variety of reinforcement learning problems. However, the traditional approach for learning optimal policies in simulators does not utilise the opportunity to improve learning by adjusting certain environment variables: state features that are unobservable and randomly determined by the environment in a physical setting but are controllable in a simulator. This article considers the problem of finding a robust policy while taking into account the impact of environment variables. We present alternating optimisation and quadrature (ALOQ), which uses Bayesian optimisation and Bayesian quadrature to address such settings. We also present transferable ALOQ (TALOQ), for settings where simulator inaccuracies lead to difficulty in transferring the learnt policy to the physical system. We show that our algorithms are robust to the presence of significant rare events, which may not be observable under random sampling but play a substantial role in determining the optimal policy. Experimental results across different domains show that our algorithms learn robust policies efficiently.

引用

页数：31

共 50 条

[41] Flexible Transfer Learning Framework for Bayesian Optimisation
Joy, Tinu Theckel
Rana, Santu
Gupta, Sunil Kumar
Venkatesh, Svetha
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2016, PT I, 2016, 9651 : 102 - 114
[42] Regret Bounds for Transfer Learning in Bayesian Optimisation
Shilton, Alistair
Gupta, Sunil
Rana, Santu
Venkatesh, Svetha
ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54, 2017, 54 : 307 - 315
[43] Particle Swarm Optimisation for learning Bayesian Networks
Cowie, J.
Oteniya, L.
Coles, R.
WORLD CONGRESS ON ENGINEERING 2007, VOLS 1 AND 2, 2007, : 71 - +
[44] Bayesian Reinforcement Learning and Bayesian Deep Learning for Blockchains With Mobile Edge Computing
Asheralieva, Alia
Niyato, Dusit
IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2021, 7 (01) : 319 - 335
[45] Variational Bayesian surrogate modelling with application to robust design optimisation
Archbold, Thomas A.
Kazlauskaite, Ieva
Cirak, Fehmi
COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2024, 611
[46] Distral: Robust Multitask Reinforcement Learning
Teh, Yee Whye
Bapst, Victor
Czarnecki, Wojciech Marian
Quan, John
Kirkpatrick, James
Hadsell, Raia
Heess, Nicolas
Pascanu, Razvan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[47] Search for Robust Policies in Reinforcement Learning
Li, Qi
ICAART: PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 2, 2020, : 421 - 428
[48] Robust Approximation in Decomposed Reinforcement Learning
Mori, Takeshi
Ishii, Shin
NEURAL INFORMATION PROCESSING, PT 1, PROCEEDINGS, 2009, 5863 : 590 - 597
[49] SPARSE BAYESIAN LEARNING FOR ROBUST PCA
Liu, Jing
Ding, Yacong
Rao, Bhaskar
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 4883 - 4887
[50] Optimisation of Matrix Production System Reconfiguration with Reinforcement Learning
Czarnetzki, Leonhard
Laflamme, Catherine
Halbwidl, Christoph
Guenther, Lisa Charlotte
Sobottka, Thomas
Bachlechner, Daniel
ADVANCES IN ARTIFICIAL INTELLIGENCE, KI 2023, 2023, 14236 : 15 - 22

← 1 2 3 4 5 →