Finite horizon partially observable semi-Markov decision processes under risk probability criteria

被引：0

作者：

Wen, Xin ^{[1
]}

Guo, Xianping ^{[2
,3
]}

Xia, Li ^{[1
,3
]}

机构：

[1] School of Business, Sun Yat-sen University, Guangzhou, China

[2] School of Mathematics, Sun Yat-sen University, Guangzhou, China

[3] Guangdong Province Key Laboratory of Computational Science, Sun Yat-sen University, Guangzhou, China

来源：

Operations Research Letters | 2024年 / 57卷

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1016/j.orl.2024.107187

中图分类号：

学科分类号：

摘要：

This paper deals with a risk probability minimization problem for finite horizon partially observable semi-Markov decision processes, which are the fairly most general models for stochastic dynamic systems. In contrast to the expected discounted and average criteria, the optimality investigated in this paper is to minimize the probability that the accumulated rewards do not reach a prescribed profit level at the finite terminal stage. First, the state space is augmented as the joint conditional distribution of the current unobserved state and the remaining profit goal. We introduce a class of policies depending on observable histories and a class of Markov policies including observable process with the joint conditional distribution. Then under mild assumptions, we prove that the value function is the unique solution to the optimality equation for the probability criterion by using iteration techniques. The existence of (ϵ-)optimal Markov policy for this problem is established. Finally, we use a bandit problem with the probability criterion to demonstrate our main results in which an effective algorithm and the corresponding numerical calculation are given for the semi-Markov model. Moreover, for the case of reduction to the discrete-time Markov model, we derive a concise solution. © 2024 Elsevier B.V.

引用

共 50 条

[21] Parameter decision in adaptive partially observable Markov decision process with finite planning horizon
Li, J.H.
Han, Z.Z.
Shanghai Jiaotong Daxue Xuebao/Journal of Shanghai Jiaotong University, 2000, 34 (12): : 1653 - 1657
[22] Risk probability optimization of finite horizon piecewise deterministic Markov decision processes
Huo, Haifeng
Wen, Xian
OPTIMIZATION, 2024,
[23] A CORRECTED AND IMPROVED COMPUTATIONAL SCHEME FOR FINITE-HORIZON PARTIALLY OBSERVABLE MARKOV DECISION-PROCESSES
MUKHERJEE, S
SETH, K
INFOR, 1991, 29 (03) : 206 - 212
[24] Optimal threshold probability and expectation in semi-Markov decision processes
Sakaguchi, Masahiko
Ohtsubo, Yoshio
APPLIED MATHEMATICS AND COMPUTATION, 2010, 216 (10) : 2947 - 2958
[25] Partially observable semi-Markov games with discounted payoff
Ghosh, Mrinal K.
Goswami, Anindya
STOCHASTIC ANALYSIS AND APPLICATIONS, 2006, 24 (05) : 1035 - 1059
[26] Partially Observable Risk-Sensitive Markov Decision Processes
Baeuerle, Nicole
Rieder, Ulrich
MATHEMATICS OF OPERATIONS RESEARCH, 2017, 42 (04) : 1180 - 1196
[27] Risk-aware semi-Markov decision processes
Isohaetaelae, Jukka
Haskell, William B.
2017 IEEE 56TH ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2017,
[28] RISK-SENSITIVE SEMI-MARKOV DECISION PROCESSES WITH GENERAL UTILITIES AND MULTIPLE CRITERIA
Huang, Yonghui
Lian, Zhaotong
Guo, Xianping
ADVANCES IN APPLIED PROBABILITY, 2018, 50 (03) : 783 - 804
[29] A NONLINEAR-PROGRAMMING MODEL FOR PARTIALLY OBSERVABLE MARKOV DECISION-PROCESSES - FINITE-HORIZON CASE
SERIN, Y
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 1995, 86 (03) : 549 - 564
[30] Average criteria in denumerable semi-Markov decision chains under risk-aversion
Rolando Cavazos-Cadena
Hugo Cruz-Suárez
Raúl Montes-De-Oca
Discrete Event Dynamic Systems, 2023, 33 : 221 - 256

← 1 2 3 4 5 →