Budgeted Reinforcement Learning in Continuous State Space

被引:0
|
作者
Carrara, Nicolas [1 ,6 ]
Leurent, Edouard [1 ,2 ,6 ]
Laroche, Romain [3 ]
Urvoy, Tanguy [4 ]
Maillard, Odalric-Ambrym [1 ]
Pietquin, Olivier [1 ,5 ,6 ]
机构
[1] INRIA Lille, Nord Europe, SequeL Team, Lille, France
[2] Renault Grp, Boulogne, France
[3] Microsoft Res, Montreal, PQ, Canada
[4] Orange Labs, Lannion, France
[5] Google Res, Brain Team, Mountain View, CA USA
[6] Univ Lille, CNRS, Cent Lille, CRIStAL,INRIA,UMR 9189, Lille, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an - adjustable - threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving(3).
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Multi-Robot Cooperation Based on Continuous Reinforcement Learning with Two State Space Representations
    Yasuda, Toshiyuki
    Ohkura, Kazuhiro
    Yamada, Kazuaki
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2013), 2013, : 4470 - 4475
  • [22] Reinforcement distribution in continuous state action space fuzzy Q-learning: A novel approach
    Bonarini, A
    Montrone, F
    Restelli, M
    [J]. FUZZY LOGIC AND APPLICATIONS, 2006, 3849 : 40 - 45
  • [23] Reinforcement Learning for Penalty Avoidance in Continuous State Spaces
    Miyazaki, Kazuteru
    Kobayashi, Shigenobu
    [J]. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2007, 11 (06) : 668 - 676
  • [24] Budgeted Multi-Armed Bandit in Continuous Action Space
    Trovo, Francesco
    Paladino, Stefano
    Restelli, Marcello
    Gatti, Nicola
    [J]. ECAI 2016: 22ND EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, 285 : 560 - 568
  • [25] Continuous-state reinforcement learning with fuzzy approximation
    Busoniu, Lucian
    Ernst, Damien
    De Schutter, Bart
    Babuska, Robert
    [J]. ADAPTIVE AGENTS AND MULTI-AGENT SYSTEMS, 2008, 4865 : 27 - +
  • [26] Pursuit-evasion with Decentralized Robotic Swarm in Continuous State Space and Action Space via Deep Reinforcement Learning
    Singh, Gurpreet
    Lofaro, Daniel M.
    Sofge, Donald
    [J]. ICAART: PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1, 2020, : 226 - 233
  • [27] A state space compression method based on multivariate analysis for reinforcement learning in high-dimensional continuous state spaces
    Satoh, Hideki
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2006, E89A (08): : 2181 - 2191
  • [28] Reinforcement Learning in Continuous Time and Space: A Stochastic Control Approach
    Wang, Haoran
    Zariphopoulou, Thaleia
    Zhou, Xun Yu
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21
  • [29] Robust Reinforcement Learning Technique with Bigeminal Representation of Continuous State Space for Multi-Robot Systems
    Yasuda, Toshiyuki
    Kage, Koki
    Ohkura, Kazuhiro
    [J]. 2012 PROCEEDINGS OF SICE ANNUAL CONFERENCE (SICE), 2012, : 1552 - 1557
  • [30] Experiments of conditioned reinforcement learning in continuous space control tasks
    Fernandez-Gauna, Borja
    Osa, Juan Luis
    Grana, Manuel
    [J]. NEUROCOMPUTING, 2018, 271 : 38 - 47