Online Learning in Markov Decision Processes with Changing Cost Sequences

被引:0
|
作者
Dick, Travis [1 ]
Gyorgy, Andras [1 ]
Szepesvari, Csaba [1 ]
机构
[1] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
ALGORITHM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we consider online learning in finite Markov decision processes (MDPs) with changing cost sequences under full and bandit-information. We propose to view this problem as an instance of online linear optimization. We propose two methods for this problem: MD2 (mirror descent with approximate projections) and the continuous exponential weights algorithm with Dikin walks. We provide a rigorous complexity analysis of these techniques, while providing near-optimal regret-bounds (in particular, we take into account the computational costs of performing approximate projections in MD2). In the case of full-information feedback, our results complement existing ones. In the case of bandit-information feedback we consider the online stochastic shortest path problem, a special case of the above MDP problems, and manage to improve the existing results by removing the previous restrictive assumption that the state-visitation probabilities are uniformly bounded away from zero under all policies.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Online Learning in Markov Decision Processes with Arbitrarily Changing Rewards and Transitions
    Yu, Jia Yuan
    Mannor, Shie
    [J]. 2009 INTERNATIONAL CONFERENCE ON GAME THEORY FOR NETWORKS (GAMENETS 2009), 2009, : 314 - 322
  • [2] Blackwell Online Learning for Markov Decision Processes
    Li, Tao
    Peng, Guanze
    Zhu, Quanyan
    [J]. 2021 55TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2021,
  • [3] Online Learning in Kernelized Markov Decision Processes
    Chowdhury, Sayak Ray
    Gopalan, Aditya
    [J]. 22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [4] Online Learning of Safety function for Markov Decision Processes
    Mazumdar, Abhijit
    Wisniewski, Rafal
    Bujorianu, Manuela L.
    [J]. 2023 EUROPEAN CONTROL CONFERENCE, ECC, 2023,
  • [5] Online Learning in Markov Decision Processes with Continuous Actions
    Hong, Yi-Te
    Lu, Chi-Jen
    [J]. ALGORITHMIC LEARNING THEORY, ALT 2015, 2015, 9355 : 302 - 316
  • [6] Learning algorithms or Markov decision processes with average cost
    Abounadi, J
    Bertsekas, D
    Borkar, VS
    [J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2001, 40 (03) : 681 - 698
  • [7] Online Markov Decision Processes
    Even-Dar, Eyal
    Kakade, Sham M.
    Mansour, Yishay
    [J]. MATHEMATICS OF OPERATIONS RESEARCH, 2009, 34 (03) : 726 - 736
  • [8] Online Learning with Implicit Exploration in Episodic Markov Decision Processes
    Ghasemi, Mahsa
    Hashemi, Abolfazl
    Vikalo, Haris
    Topcu, Ufuk
    [J]. 2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 1953 - 1958
  • [9] Online Markov Decision Processes With Kullback-Leibler Control Cost
    Guan, Peng
    Raginsky, Maxim
    Willett, Rebecca M.
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2014, 59 (06) : 1423 - 1438
  • [10] Online Markov Decision Processes with Kullback-Leibler Control Cost
    Guan, Peng
    Raginsky, Maxim
    Willett, Rebecca
    [J]. 2012 AMERICAN CONTROL CONFERENCE (ACC), 2012, : 1388 - 1393