Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes

被引：0

作者：

Brazdil, Tomas ^{[1
]}

Chatterjee, Krishnendu ^{[2
]}

Novotny, Petr ^{[1
]}

Vahala, Jiri ^{[1
]}

机构：

[1] Masaryk Univ, Fac Informat, Brno, Czech Republic

[2] IST Austria, Klosterneuburg, Austria

来源：

THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2020年 / 34卷

基金：

奥地利科学基金会;

关键词：

GO;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Markov decision processes (MDPs) are the defacto framework for sequential decision making in the presence of stochastic uncertainty. A classical optimization criterion for MDPs is to maximize the expected discounted-sum payoff, which ignores low probability catastrophic events with highly negative impact on the system. On the other hand, risk-averse policies require the probability of undesirable events to be below a given threshold, but they do not account for optimization of the expected payoff. We consider MDPs with discounted-sum payoff with failure states which represent catastrophic outcomes. The objective of risk-constrained planning is to maximize the expected discounted-sum payoff among risk-averse policies that ensure the probability to encounter a failure state is below a desired threshold. Our main contribution is an efficient risk-constrained planning algorithm that combines UCT-like search with a predictor learned through interaction with the MDP (in the style of AlphaZero) and with a risk-constrained action selection via linear programming. We demonstrate the effectiveness of our approach with experiments on classical MDPs from the literature, including benchmarks with an order of 10(6) states.

引用

页码：9794 / 9801

页数：8

共 50 条

[1] Risk-Constrained Markov Decision Processes
Borkar, Vivek
Jain, Rahul
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2014, 59 (09) : 2574 - 2579
[2] Risk-constrained Markov Decision Processes
Borkar, Vivek
Jain, Rahul
[J]. 49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, : 2664 - 2669
[3] Reinforcement Learning for Constrained Markov Decision Processes
Gattami, Ather
Bai, Qinbo
Aggarwal, Vaneet
[J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[4] Risk-constrained reinforcement learning with percentile risk criteria
DeepMind, Mountain View
CA
94043, United States
不详
CA
94305, United States
不详
CA
94305, United States
[J]. J. Mach. Learn. Res., 1600, (1-51):
[5] Risk-Constrained Reinforcement Learning with Percentile Risk Criteria
Chow, Yinlam
Ghavamzadeh, Mohammad
Janson, Lucas
Pavone, Marco
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2018, 18
[6] Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes
Roy, Arghyadip
Borkar, Vivek
Karandikar, Abhay
Chaporkar, Prasanna
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (07) : 3722 - 3729
[7] Robustness of policies in constrained Markov decision processes
Zadorojniy, A
Shwartz, A
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2006, 51 (04) : 635 - 638
[8] Learning in Constrained Markov Decision Processes
Singh, Rahul
Gupta, Abhishek
Shroff, Ness B.
[J]. IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2023, 10 (01): : 441 - 453
[9] Semi-Infinitely Constrained Markov Decision Processes and Provably Efficient Reinforcement Learning
Zhang, Liangyu
Peng, Yang
Yang, Wenhao
Zhang, Zhihua
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (05) : 3722 - 3735
[10] Reinforcement Learning in Robust Markov Decision Processes
Lim, Shiau Hong
Xu, Huan
Mannor, Shie
[J]. MATHEMATICS OF OPERATIONS RESEARCH, 2016, 41 (04) : 1325 - 1353

← 1 2 3 4 5 →