Markov Decision Processes with Arbitrary Reward Processes

被引：0

作者：

Yu, Jia Yuan ^{[1
]}

Mannor, Shie ^{[1
]}

Shimkin, Nahum ^{[2
]}

机构：

[1] McGill Univ, Montreal, PQ H3A 2T5, Canada

[2] Technion, Haifa, Israel

来源：

RECENT ADVANCES IN REINFORCEMENT LEARNING | 2008年 / 5323卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider a control problem where the decision maker interacts with a standard Markov decision process with the exception that the reward functions vary arbitrarily over time. We extend the notion of Hannan consistency to this setting, showing that, in hindsight, the agent can perform almost as well as every deterministic policy. We present efficient online algorithms in the spirit of reinforcement learning that ensure that the agent's performance loss, or regret, vanishes over time, provided that the environment is oblivious to the agent's actions. However, counterexamples indicate that the regret does not vanish if the environment is not oblivious.

引用

页码：268 / +

页数：3

共 50 条

[31] A preorder relation for Markov reward processes
Daly, David
Buchholz, Peter
Sanders, William H.
STATISTICS & PROBABILITY LETTERS, 2007, 77 (11) : 1148 - 1157
[32] Distributed optimization of Markov reward processes
Campos-Nane, Enrique
PROCEEDINGS OF THE 46TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-14, 2007, : 3921 - 3926
[33] Timing in reward and decision processes
Bermudez, Maria A.
Schultz, Wolfram
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2014, 369 (1637)
[34] Markov reward models and markov decision processes in discrete and continuous time: Performance evaluation and optimization
Gouberman, Alexander
Siegle, Markus
Gouberman, Alexander (alexander.gouberman@unibw.de), 1600, Springer Verlag (8453): : 156 - 241
[35] Reward processes for semi-Markov processes: Asymptotic behaviour
Soltani, AR
Khorshidian, K
JOURNAL OF APPLIED PROBABILITY, 1998, 35 (04) : 833 - 842
[36] Splitting Randomized Stationary Policies in Total-Reward Markov Decision Processes
Feinberg, Eugene A.
Rothblum, Uriel G.
MATHEMATICS OF OPERATIONS RESEARCH, 2012, 37 (01) : 129 - 153
[37] RVI Reinforcement Learning for Semi-Markov Decision Processes with Average Reward
Li, Yanjie
Cao, Fang
2010 8TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2010, : 1674 - 1679
[38] On average reward semi-markov decision processes with a general multichain structure
Jianyong, L
Xiaobo, Z
MATHEMATICS OF OPERATIONS RESEARCH, 2004, 29 (02) : 339 - 352
[39] Partially observable Markov decision processes with reward information: Basic ideas and models
Cao, Xi-Ren
Guo, Xianping
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2007, 52 (04) : 677 - 681
[40] Incremental Improvements of Heuristic Policies for Average-Reward Markov Decision Processes
Reveliotis, S.
Ibrahim, M.
IFAC PAPERSONLINE, 2020, 53 (02): : 1721 - 1728

← 1 2 3 4 5 →