Online Learning in Markov Decision Processes with Changing Cost Sequences

被引：0

作者：

Dick, Travis ^{[1
]}

Gyorgy, Andras ^{[1
]}

Szepesvari, Csaba ^{[1
]}

机构：

[1] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8, Canada

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1) | 2014年 / 32卷

基金：

加拿大自然科学与工程研究理事会;

关键词：

ALGORITHM;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper we consider online learning in finite Markov decision processes (MDPs) with changing cost sequences under full and bandit-information. We propose to view this problem as an instance of online linear optimization. We propose two methods for this problem: MD2 (mirror descent with approximate projections) and the continuous exponential weights algorithm with Dikin walks. We provide a rigorous complexity analysis of these techniques, while providing near-optimal regret-bounds (in particular, we take into account the computational costs of performing approximate projections in MD2). In the case of full-information feedback, our results complement existing ones. In the case of bandit-information feedback we consider the online stochastic shortest path problem, a special case of the above MDP problems, and manage to improve the existing results by removing the previous restrictive assumption that the state-visitation probabilities are uniformly bounded away from zero under all policies.

引用

页数：9

共 50 条

[31] Episodic task learning in Markov decision processes
Lin, Yong
Makedon, Fillia
Xu, Yurong
[J]. ARTIFICIAL INTELLIGENCE REVIEW, 2011, 36 (02) : 87 - 98
[32] LEARNING ALGORITHMS FOR MARKOV DECISION-PROCESSES
KURANO, M
[J]. JOURNAL OF APPLIED PROBABILITY, 1987, 24 (01) : 270 - 276
[33] Learning Factored Markov Decision Processes with Unawareness
Innes, Craig
Lascarides, Alex
[J]. 35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), 2020, 115 : 123 - 133
[34] Reinforcement Learning in Robust Markov Decision Processes
Lim, Shiau Hong
Xu, Huan
Mannor, Shie
[J]. MATHEMATICS OF OPERATIONS RESEARCH, 2016, 41 (04) : 1325 - 1353
[35] Episodic task learning in Markov decision processes
Yong Lin
Fillia Makedon
Yurong Xu
[J]. Artificial Intelligence Review, 2011, 36 : 87 - 98
[36] Robust Anytime Learning of Markov Decision Processes
Suilen, Marnix
Simao, Thiago D.
Parker, David
Jansen, Nils
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[37] Learning Markov Decision Processes for Model Checking
Mao, Hua
Chen, Yingke
Jaeger, Manfred
Nielsen, Thomas D.
Larsen, Kim G.
Nielsen, Brian
[J]. ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2012, (103): : 49 - 63
[38] AVERAGE COST SEMI-MARKOV DECISION PROCESSES
ROSS, SM
[J]. JOURNAL OF APPLIED PROBABILITY, 1970, 7 (03) : 649 - &
[39] Markov decision processes with delays and asynchronous cost collection
Katsikopoulos, KV
Engelbrecht, SE
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2003, 48 (04) : 568 - 574
[40] Simple Regret Optimization in Online Planning for Markov Decision Processes
Feldman, Zohar
Domshlak, Carmel
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2014, 51 : 165 - 205

← 1 2 3 4 5 →