Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes

被引:0
|
作者
Brazdil, Tomas [1 ]
Chatterjee, Krishnendu [2 ]
Novotny, Petr [1 ]
Vahala, Jiri [1 ]
机构
[1] Masaryk Univ, Fac Informat, Brno, Czech Republic
[2] IST Austria, Klosterneuburg, Austria
基金
奥地利科学基金会;
关键词
GO;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Markov decision processes (MDPs) are the defacto framework for sequential decision making in the presence of stochastic uncertainty. A classical optimization criterion for MDPs is to maximize the expected discounted-sum payoff, which ignores low probability catastrophic events with highly negative impact on the system. On the other hand, risk-averse policies require the probability of undesirable events to be below a given threshold, but they do not account for optimization of the expected payoff. We consider MDPs with discounted-sum payoff with failure states which represent catastrophic outcomes. The objective of risk-constrained planning is to maximize the expected discounted-sum payoff among risk-averse policies that ensure the probability to encounter a failure state is below a desired threshold. Our main contribution is an efficient risk-constrained planning algorithm that combines UCT-like search with a predictor learned through interaction with the MDP (in the style of AlphaZero) and with a risk-constrained action selection via linear programming. We demonstrate the effectiveness of our approach with experiments on classical MDPs from the literature, including benchmarks with an order of 10(6) states.
引用
收藏
页码:9794 / 9801
页数:8
相关论文
共 50 条
  • [1] Risk-Constrained Markov Decision Processes
    Borkar, Vivek
    Jain, Rahul
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2014, 59 (09) : 2574 - 2579
  • [2] Risk-constrained Markov Decision Processes
    Borkar, Vivek
    Jain, Rahul
    [J]. 49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, : 2664 - 2669
  • [3] Reinforcement Learning for Constrained Markov Decision Processes
    Gattami, Ather
    Bai, Qinbo
    Aggarwal, Vaneet
    [J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [4] Risk-constrained reinforcement learning with percentile risk criteria
    DeepMind, Mountain View
    CA
    94043, United States
    不详
    CA
    94305, United States
    不详
    CA
    94305, United States
    [J]. J. Mach. Learn. Res., 1600, (1-51):
  • [5] Risk-Constrained Reinforcement Learning with Percentile Risk Criteria
    Chow, Yinlam
    Ghavamzadeh, Mohammad
    Janson, Lucas
    Pavone, Marco
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2018, 18
  • [6] Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes
    Roy, Arghyadip
    Borkar, Vivek
    Karandikar, Abhay
    Chaporkar, Prasanna
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (07) : 3722 - 3729
  • [7] Robustness of policies in constrained Markov decision processes
    Zadorojniy, A
    Shwartz, A
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2006, 51 (04) : 635 - 638
  • [8] Learning in Constrained Markov Decision Processes
    Singh, Rahul
    Gupta, Abhishek
    Shroff, Ness B.
    [J]. IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2023, 10 (01): : 441 - 453
  • [9] Semi-Infinitely Constrained Markov Decision Processes and Provably Efficient Reinforcement Learning
    Zhang, Liangyu
    Peng, Yang
    Yang, Wenhao
    Zhang, Zhihua
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (05) : 3722 - 3735
  • [10] Reinforcement Learning in Robust Markov Decision Processes
    Lim, Shiau Hong
    Xu, Huan
    Mannor, Shie
    [J]. MATHEMATICS OF OPERATIONS RESEARCH, 2016, 41 (04) : 1325 - 1353