Markov Decision Processes with Arbitrary Reward Processes

被引：0

作者：

Yu, Jia Yuan ^{[1
]}

Mannor, Shie ^{[1
]}

Shimkin, Nahum ^{[2
]}

机构：

[1] McGill Univ, Montreal, PQ H3A 2T5, Canada

[2] Technion, Haifa, Israel

来源：

RECENT ADVANCES IN REINFORCEMENT LEARNING | 2008年 / 5323卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider a control problem where the decision maker interacts with a standard Markov decision process with the exception that the reward functions vary arbitrarily over time. We extend the notion of Hannan consistency to this setting, showing that, in hindsight, the agent can perform almost as well as every deterministic policy. We present efficient online algorithms in the spirit of reinforcement learning that ensure that the agent's performance loss, or regret, vanishes over time, provided that the environment is oblivious to the agent's actions. However, counterexamples indicate that the regret does not vanish if the environment is not oblivious.

引用

页码：268 / +

页数：3

共 50 条

[1] Markov Decision Processes with Arbitrary Reward Processes
Yu, Jia Yuan
Mannor, Shie
Shimkin, Nahum
MATHEMATICS OF OPERATIONS RESEARCH, 2009, 34 (03) : 737 - 757
[2] Robust Average-Reward Markov Decision Processes
Wang, Yue
Velasquez, Alvaro
Atia, George
Prater-Bennette, Ashley
Zou, Shaofeng
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 15215 - 15223
[3] Functional Reward Markov Decision Processes: Theory and Applications
Weng, Paul
Spanjaard, Olivier
INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2017, 26 (03)
[4] CONVERGING MARKOV DECISION PROCESSES WITH MULTIPLICATIVE REWARD SYSTEM
Fujita T.
Bulletin of the Kyushu Institute of Technology - Pure and Applied Mathematics, 2023, 2023 (70): : 33 - 41
[5] Average-Reward Decentralized Markov Decision Processes
Petrik, Marek
Zilberstein, Shlomo
20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1997 - 2002
[6] Partially observable Markov decision processes with reward information
Cao, XR
Guo, XP
2004 43RD IEEE CONFERENCE ON DECISION AND CONTROL (CDC), VOLS 1-5, 2004, : 4393 - 4398
[7] Bounding reward measures of Markov models using the Markov decision processes
Buchholz, Peter
NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, 2011, 18 (06) : 919 - 930
[8] MARKOV DECISION-PROCESSES - DISCOUNTED EXPECTED REWARD OR AVERAGE EXPECTED REWARD
WHITE, DJ
JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1993, 172 (02) : 375 - 384
[9] Perceptive evaluation for the optimal discounted reward in Markov decision processes
Kurano, M
Yasuda, M
Nakagami, J
Yoshida, Y
MODELING DECISIONS FOR ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, 3558 : 283 - 293
[10] REVERSIBLE MARKOV DECISION PROCESSES WITH AN AVERAGE-REWARD CRITERION
Cogill, Randy
Peng, Cheng
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2013, 51 (01) : 402 - 418

← 1 2 3 4 5 →