Robust Reinforcement Learning with Bayesian Optimisation and Quadrature

被引:0
|
作者
Paul, Supratik [1 ]
Chatzilygeroudis, Konstantinos [2 ,3 ,4 ]
Ciosek, Kamil [1 ]
Mouret, Jean-Baptiste [2 ,3 ,4 ]
Osborne, Michael A. [5 ]
Whiteson, Shimon [1 ]
机构
[1] Univ Oxford, Dept Comp Sci, Wolfson Bldg,Parks Rd, Oxford OX1 3QD, England
[2] INRIA, Paris, France
[3] Univ Lorraine, Nancy, France
[4] CNRS, Paris, France
[5] Univ Oxford, Dept Engn Sci, Oxford, England
基金
欧洲研究理事会;
关键词
Reinforcement Learning; Bayesian Optimisation; Bayesian Quadrature; Significant rare events; Environment variables;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Bayesian optimisation has been successfully applied to a variety of reinforcement learning problems. However, the traditional approach for learning optimal policies in simulators does not utilise the opportunity to improve learning by adjusting certain environment variables: state features that are unobservable and randomly determined by the environment in a physical setting but are controllable in a simulator. This article considers the problem of finding a robust policy while taking into account the impact of environment variables. We present alternating optimisation and quadrature (ALOQ), which uses Bayesian optimisation and Bayesian quadrature to address such settings. We also present transferable ALOQ (TALOQ), for settings where simulator inaccuracies lead to difficulty in transferring the learnt policy to the physical system. We show that our algorithms are robust to the presence of significant rare events, which may not be observable under random sampling but play a substantial role in determining the optimal policy. Experimental results across different domains show that our algorithms learn robust policies efficiently.
引用
收藏
页数:31
相关论文
共 50 条
  • [41] Flexible Transfer Learning Framework for Bayesian Optimisation
    Joy, Tinu Theckel
    Rana, Santu
    Gupta, Sunil Kumar
    Venkatesh, Svetha
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2016, PT I, 2016, 9651 : 102 - 114
  • [42] Regret Bounds for Transfer Learning in Bayesian Optimisation
    Shilton, Alistair
    Gupta, Sunil
    Rana, Santu
    Venkatesh, Svetha
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54, 2017, 54 : 307 - 315
  • [43] Particle Swarm Optimisation for learning Bayesian Networks
    Cowie, J.
    Oteniya, L.
    Coles, R.
    WORLD CONGRESS ON ENGINEERING 2007, VOLS 1 AND 2, 2007, : 71 - +
  • [44] Bayesian Reinforcement Learning and Bayesian Deep Learning for Blockchains With Mobile Edge Computing
    Asheralieva, Alia
    Niyato, Dusit
    IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2021, 7 (01) : 319 - 335
  • [45] Variational Bayesian surrogate modelling with application to robust design optimisation
    Archbold, Thomas A.
    Kazlauskaite, Ieva
    Cirak, Fehmi
    COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2024, 611
  • [46] Distral: Robust Multitask Reinforcement Learning
    Teh, Yee Whye
    Bapst, Victor
    Czarnecki, Wojciech Marian
    Quan, John
    Kirkpatrick, James
    Hadsell, Raia
    Heess, Nicolas
    Pascanu, Razvan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [47] Search for Robust Policies in Reinforcement Learning
    Li, Qi
    ICAART: PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 2, 2020, : 421 - 428
  • [48] Robust Approximation in Decomposed Reinforcement Learning
    Mori, Takeshi
    Ishii, Shin
    NEURAL INFORMATION PROCESSING, PT 1, PROCEEDINGS, 2009, 5863 : 590 - 597
  • [49] SPARSE BAYESIAN LEARNING FOR ROBUST PCA
    Liu, Jing
    Ding, Yacong
    Rao, Bhaskar
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 4883 - 4887
  • [50] Optimisation of Matrix Production System Reconfiguration with Reinforcement Learning
    Czarnetzki, Leonhard
    Laflamme, Catherine
    Halbwidl, Christoph
    Guenther, Lisa Charlotte
    Sobottka, Thomas
    Bachlechner, Daniel
    ADVANCES IN ARTIFICIAL INTELLIGENCE, KI 2023, 2023, 14236 : 15 - 22