Online Learning in Markov Decision Processes with Changing Cost Sequences

被引：0

作者：

Dick, Travis ^{[1
]}

Gyorgy, Andras ^{[1
]}

Szepesvari, Csaba ^{[1
]}

机构：

[1] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8, Canada

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1) | 2014年 / 32卷

基金：

加拿大自然科学与工程研究理事会;

关键词：

ALGORITHM;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper we consider online learning in finite Markov decision processes (MDPs) with changing cost sequences under full and bandit-information. We propose to view this problem as an instance of online linear optimization. We propose two methods for this problem: MD2 (mirror descent with approximate projections) and the continuous exponential weights algorithm with Dikin walks. We provide a rigorous complexity analysis of these techniques, while providing near-optimal regret-bounds (in particular, we take into account the computational costs of performing approximate projections in MD2). In the case of full-information feedback, our results complement existing ones. In the case of bandit-information feedback we consider the online stochastic shortest path problem, a special case of the above MDP problems, and manage to improve the existing results by removing the previous restrictive assumption that the state-visitation probabilities are uniformly bounded away from zero under all policies.

引用

页数：9

共 50 条

[1] Online Learning in Markov Decision Processes with Arbitrarily Changing Rewards and Transitions
Yu, Jia Yuan
Mannor, Shie
[J]. 2009 INTERNATIONAL CONFERENCE ON GAME THEORY FOR NETWORKS (GAMENETS 2009), 2009, : 314 - 322
[2] Blackwell Online Learning for Markov Decision Processes
Li, Tao
Peng, Guanze
Zhu, Quanyan
[J]. 2021 55TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2021,
[3] Online Learning in Kernelized Markov Decision Processes
Chowdhury, Sayak Ray
Gopalan, Aditya
[J]. 22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
[4] Online Learning of Safety function for Markov Decision Processes
Mazumdar, Abhijit
Wisniewski, Rafal
Bujorianu, Manuela L.
[J]. 2023 EUROPEAN CONTROL CONFERENCE, ECC, 2023,
[5] Online Learning in Markov Decision Processes with Continuous Actions
Hong, Yi-Te
Lu, Chi-Jen
[J]. ALGORITHMIC LEARNING THEORY, ALT 2015, 2015, 9355 : 302 - 316
[6] Learning algorithms or Markov decision processes with average cost
Abounadi, J
Bertsekas, D
Borkar, VS
[J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2001, 40 (03) : 681 - 698
[7] Online Markov Decision Processes
Even-Dar, Eyal
Kakade, Sham M.
Mansour, Yishay
[J]. MATHEMATICS OF OPERATIONS RESEARCH, 2009, 34 (03) : 726 - 736
[8] Online Learning with Implicit Exploration in Episodic Markov Decision Processes
Ghasemi, Mahsa
Hashemi, Abolfazl
Vikalo, Haris
Topcu, Ufuk
[J]. 2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 1953 - 1958
[9] Online Markov Decision Processes With Kullback-Leibler Control Cost
Guan, Peng
Raginsky, Maxim
Willett, Rebecca M.
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2014, 59 (06) : 1423 - 1438
[10] Online Markov Decision Processes with Kullback-Leibler Control Cost
Guan, Peng
Raginsky, Maxim
Willett, Rebecca
[J]. 2012 AMERICAN CONTROL CONFERENCE (ACC), 2012, : 1388 - 1393

← 1 2 3 4 5 →