Online Learning in Markov Decision Processes with Changing Cost Sequences

被引:0
|
作者
Dick, Travis [1 ]
Gyorgy, Andras [1 ]
Szepesvari, Csaba [1 ]
机构
[1] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
ALGORITHM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we consider online learning in finite Markov decision processes (MDPs) with changing cost sequences under full and bandit-information. We propose to view this problem as an instance of online linear optimization. We propose two methods for this problem: MD2 (mirror descent with approximate projections) and the continuous exponential weights algorithm with Dikin walks. We provide a rigorous complexity analysis of these techniques, while providing near-optimal regret-bounds (in particular, we take into account the computational costs of performing approximate projections in MD2). In the case of full-information feedback, our results complement existing ones. In the case of bandit-information feedback we consider the online stochastic shortest path problem, a special case of the above MDP problems, and manage to improve the existing results by removing the previous restrictive assumption that the state-visitation probabilities are uniformly bounded away from zero under all policies.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Episodic task learning in Markov decision processes
    Lin, Yong
    Makedon, Fillia
    Xu, Yurong
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2011, 36 (02) : 87 - 98
  • [32] LEARNING ALGORITHMS FOR MARKOV DECISION-PROCESSES
    KURANO, M
    [J]. JOURNAL OF APPLIED PROBABILITY, 1987, 24 (01) : 270 - 276
  • [33] Learning Factored Markov Decision Processes with Unawareness
    Innes, Craig
    Lascarides, Alex
    [J]. 35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), 2020, 115 : 123 - 133
  • [34] Reinforcement Learning in Robust Markov Decision Processes
    Lim, Shiau Hong
    Xu, Huan
    Mannor, Shie
    [J]. MATHEMATICS OF OPERATIONS RESEARCH, 2016, 41 (04) : 1325 - 1353
  • [35] Episodic task learning in Markov decision processes
    Yong Lin
    Fillia Makedon
    Yurong Xu
    [J]. Artificial Intelligence Review, 2011, 36 : 87 - 98
  • [36] Robust Anytime Learning of Markov Decision Processes
    Suilen, Marnix
    Simao, Thiago D.
    Parker, David
    Jansen, Nils
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [37] Learning Markov Decision Processes for Model Checking
    Mao, Hua
    Chen, Yingke
    Jaeger, Manfred
    Nielsen, Thomas D.
    Larsen, Kim G.
    Nielsen, Brian
    [J]. ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2012, (103): : 49 - 63
  • [38] AVERAGE COST SEMI-MARKOV DECISION PROCESSES
    ROSS, SM
    [J]. JOURNAL OF APPLIED PROBABILITY, 1970, 7 (03) : 649 - &
  • [39] Markov decision processes with delays and asynchronous cost collection
    Katsikopoulos, KV
    Engelbrecht, SE
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2003, 48 (04) : 568 - 574
  • [40] Simple Regret Optimization in Online Planning for Markov Decision Processes
    Feldman, Zohar
    Domshlak, Carmel
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2014, 51 : 165 - 205