Markov Decision Processes with Arbitrary Reward Processes

被引:0
|
作者
Yu, Jia Yuan [1 ]
Mannor, Shie [1 ]
Shimkin, Nahum [2 ]
机构
[1] McGill Univ, Montreal, PQ H3A 2T5, Canada
[2] Technion, Haifa, Israel
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider a control problem where the decision maker interacts with a standard Markov decision process with the exception that the reward functions vary arbitrarily over time. We extend the notion of Hannan consistency to this setting, showing that, in hindsight, the agent can perform almost as well as every deterministic policy. We present efficient online algorithms in the spirit of reinforcement learning that ensure that the agent's performance loss, or regret, vanishes over time, provided that the environment is oblivious to the agent's actions. However, counterexamples indicate that the regret does not vanish if the environment is not oblivious.
引用
收藏
页码:268 / +
页数:3
相关论文
共 50 条
  • [1] Markov Decision Processes with Arbitrary Reward Processes
    Yu, Jia Yuan
    Mannor, Shie
    Shimkin, Nahum
    MATHEMATICS OF OPERATIONS RESEARCH, 2009, 34 (03) : 737 - 757
  • [2] Robust Average-Reward Markov Decision Processes
    Wang, Yue
    Velasquez, Alvaro
    Atia, George
    Prater-Bennette, Ashley
    Zou, Shaofeng
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 15215 - 15223
  • [3] Functional Reward Markov Decision Processes: Theory and Applications
    Weng, Paul
    Spanjaard, Olivier
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2017, 26 (03)
  • [4] CONVERGING MARKOV DECISION PROCESSES WITH MULTIPLICATIVE REWARD SYSTEM
    Fujita T.
    Bulletin of the Kyushu Institute of Technology - Pure and Applied Mathematics, 2023, 2023 (70): : 33 - 41
  • [5] Average-Reward Decentralized Markov Decision Processes
    Petrik, Marek
    Zilberstein, Shlomo
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1997 - 2002
  • [6] Partially observable Markov decision processes with reward information
    Cao, XR
    Guo, XP
    2004 43RD IEEE CONFERENCE ON DECISION AND CONTROL (CDC), VOLS 1-5, 2004, : 4393 - 4398
  • [7] Bounding reward measures of Markov models using the Markov decision processes
    Buchholz, Peter
    NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, 2011, 18 (06) : 919 - 930
  • [8] MARKOV DECISION-PROCESSES - DISCOUNTED EXPECTED REWARD OR AVERAGE EXPECTED REWARD
    WHITE, DJ
    JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1993, 172 (02) : 375 - 384
  • [9] Perceptive evaluation for the optimal discounted reward in Markov decision processes
    Kurano, M
    Yasuda, M
    Nakagami, J
    Yoshida, Y
    MODELING DECISIONS FOR ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, 3558 : 283 - 293
  • [10] REVERSIBLE MARKOV DECISION PROCESSES WITH AN AVERAGE-REWARD CRITERION
    Cogill, Randy
    Peng, Cheng
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2013, 51 (01) : 402 - 418