Robust Reinforcement Learning with Bayesian Optimisation and Quadrature

被引:0
|
作者
Paul, Supratik [1 ]
Chatzilygeroudis, Konstantinos [2 ,3 ,4 ]
Ciosek, Kamil [1 ]
Mouret, Jean-Baptiste [2 ,3 ,4 ]
Osborne, Michael A. [5 ]
Whiteson, Shimon [1 ]
机构
[1] Univ Oxford, Dept Comp Sci, Wolfson Bldg,Parks Rd, Oxford OX1 3QD, England
[2] INRIA, Paris, France
[3] Univ Lorraine, Nancy, France
[4] CNRS, Paris, France
[5] Univ Oxford, Dept Engn Sci, Oxford, England
基金
欧洲研究理事会;
关键词
Reinforcement Learning; Bayesian Optimisation; Bayesian Quadrature; Significant rare events; Environment variables;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Bayesian optimisation has been successfully applied to a variety of reinforcement learning problems. However, the traditional approach for learning optimal policies in simulators does not utilise the opportunity to improve learning by adjusting certain environment variables: state features that are unobservable and randomly determined by the environment in a physical setting but are controllable in a simulator. This article considers the problem of finding a robust policy while taking into account the impact of environment variables. We present alternating optimisation and quadrature (ALOQ), which uses Bayesian optimisation and Bayesian quadrature to address such settings. We also present transferable ALOQ (TALOQ), for settings where simulator inaccuracies lead to difficulty in transferring the learnt policy to the physical system. We show that our algorithms are robust to the presence of significant rare events, which may not be observable under random sampling but play a substantial role in determining the optimal policy. Experimental results across different domains show that our algorithms learn robust policies efficiently.
引用
收藏
页数:31
相关论文
共 50 条
  • [1] Robust reinforcement learning with bayesian optimisation and quadrature
    Paul, Supratik
    Chatzilygeroudis, Konstantinos
    Ciosek, Kamil
    Mouret, Jean-Baptiste
    Osborne, Michael A.
    Whiteson, Shimon
    Journal of Machine Learning Research, 2020, 21
  • [2] A Bayesian Approach to Robust Reinforcement Learning
    Derman, Esther
    Mankowitz, Daniel
    Mann, Timothy
    Mannor, Shie
    35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), 2020, 115 : 648 - 658
  • [3] Fingerprint Policy Optimisation for Robust Reinforcement Learning
    Paul, Supratik
    Osborne, Michael A.
    Whiteson, Shimon
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [4] A Bayesian Approach to Robust Inverse Reinforcement Learning
    Wei, Ran
    Zeng, Siliang
    Li, Chenliang
    Garcia, Alfredo
    McDonald, Anthony
    Hong, Mingyi
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [5] Adversarial Proximal Policy Optimisation for Robust Reinforcement Learning
    Ince, Bilkan
    Shin, Hyo-Sang
    Tsourdos, Antonios
    AIAA SCITECH 2024 FORUM, 2024,
  • [6] Alternating Optimisation and Quadrature for Robust Control
    Paul, Supratik
    Chatzilygeroudis, Konstantinos
    Ciosek, Kamil
    Mouret, Jean-Baptiste
    Osborne, Michael A.
    Whiteson, Shimon
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3925 - 3933
  • [7] Distributionally Robust Bayesian Quadrature Optimization
    Thanh Tang Nguyen
    Gupta, Sunil
    Ha, Huong
    Rana, Santu
    Venkatesh, Svetha
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 1921 - 1930
  • [8] Robust Bayesian Inverse Reinforcement Learning with Sparse Behavior Noise
    Zheng, Jiangchuan
    Liu, Siyuan
    Ni, Lionel M.
    PROCEEDINGS OF THE TWENTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2014, : 2198 - 2205
  • [9] Robust and Explorative Behavior in Model-based Bayesian Reinforcement Learning
    Hishinuma, Toru
    Senda, Kei
    PROCEEDINGS OF 2016 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2016,
  • [10] Reinforcement learning and stochastic optimisation
    Sebastian Jaimungal
    Finance and Stochastics, 2022, 26 : 103 - 129