Learning Parameterized Prescription Policies and Disease Progression Dynamics using Markov Decision Processes

被引：0

作者：

Zhu, Henghui ^{[1
]}

Xu, Tingting ^{[1
]}

Paschalidis, Ioannis Ch ^{[2
,3
]}

机构：

[1] Boston Univ, Ctr Informat & Syst Engn, Boston, MA 02215 USA

[2] Boston Univ, Dept Elect & Comp Engn, Div Syst Engn, 8 St Marys St, Boston, MA 02215 USA

[3] Boston Univ, Dept Biomed Engn, 8 St Marys St, Boston, MA 02215 USA

来源：

2019 AMERICAN CONTROL CONFERENCE (ACC) | 2019年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We develop an algorithm for learning physicians' prescription policies and the disease progression dynamics from Electronic Health Record (EHR) data. The prescription protocol used by physicians is viewed as a control policy which is a function of an underlying disease state in a Markov Decision Process (MDP) framework. We assume that the transition probabilities and the policy of the MDP are parameterized using some known features, such that only a small portion of them are informative. Two l(1)-regularized maximum likelihood estimation problems are formulated to learn the transition probabilities and the policy, respectively. A bound is established on the difference between the average reward of the estimated policy under the estimated transition dynamics and the original (unknown) policy under the true transition dynamics. Our result suggests that by using only a relatively small number of training samples, the estimate can achieve a low regret. We validate our theoretical results on a test MDP motivated by a disease treatment identification application.

引用

页码：3438 / 3443

页数：6

共 50 条

[1] Learning Parameterized Policies for Markov Decision Processes through Demonstrations
Hanawal, Manjesh K.
Liu, Hao
Zhu, Henghui
Paschalidis, Ioannis Ch.
[J]. 2016 IEEE 55TH CONFERENCE ON DECISION AND CONTROL (CDC), 2016, : 7087 - 7092
[2] Learning Policies for Markov Decision Processes in Continuous Spaces
Paternain, Santiago
Bazerque, Juan Andres
Small, Austin
Ribeiro, Alejandro
[J]. 2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 4751 - 4758
[3] Learning Policies for Markov Decision Processes From Data
Hanawal, Manjesh Kumar
Liu, Hao
Zhu, Henghui
Paschalidis, Ioannis Ch.
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2019, 64 (06) : 2298 - 2309
[4] Variance minimization of parameterized Markov decision processes
Li Xia
[J]. Discrete Event Dynamic Systems, 2018, 28 : 63 - 81
[5] Variance minimization of parameterized Markov decision processes
Xia, Li
[J]. DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS, 2018, 28 (01): : 63 - 81
[6] Learning deterministic policies in partially observable Markov decision processes
Miyazaki, K
Kobayashi, S
[J]. INTELLIGENT AUTONOMOUS SYSTEMS: IAS-5, 1998, : 250 - 257
[7] On Markov policies for minimax decision processes
Iwamoto, S
Tsurusaki, K
[J]. JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2001, 253 (01) : 58 - 78
[8] Parameterized Penalties in the Dual Representation of Markov Decision Processes
Ye, Fan
Zhou, Enlu
[J]. 2012 IEEE 51ST ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2012, : 870 - 876
[9] Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes
Brazdil, Tomas
Chatterjee, Krishnendu
Novotny, Petr
Vahala, Jiri
[J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9794 - 9801
[10] Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes
Roy, Arghyadip
Borkar, Vivek
Karandikar, Abhay
Chaporkar, Prasanna
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (07) : 3722 - 3729

← 1 2 3 4 5 →