Decentralized Learning in Finite Markov Chains: Revisited

被引：6

作者：

Chang, Hyeong Soo ^{[1
]}

机构：

[1] Sogang Univ, Dept Comp Sci & Engn, Seoul 121742, South Korea

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2009年 / 54卷 / 07期

关键词：

Controlled Markov chain; decentralized learning; fictitious play; learning automata; Markov decision process; THEORETIC APPROACH; SIMULATION; HORIZON;

D O I：

10.1109/TAC.2009.2017977

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The convergence proof in the paper "Decentralized learning in finite Markov chains," published in the IEEE Transactions on Automatic Control, vol. AC-31, no. 6, pp. 519-526, 1986, is incomplete. This note first provides a sufficient condition for the existence of a unique optimal policy for infinite-horizon average-cost Markov decision processes (MDPs), making the convergence result established by Wheeler and Narendra preserved with the condition. We then present a novel simulation-based decentralized algorithm, called "sampled joint-strategy fictitious play for MDP" for average MDPs based on the recent study by Garcia et al. of a decentralized approach to discrete optimization via fictitious play applied to games with identical payoffs. We establish a stronger almost-sure convergence result than Wheeler and Narendra's, showing that the sequence of probability distributions over the policy space for a given MDP generated by the algorithm converges to a unique optimal policy with probability one.

引用

页码：1648 / 1653

页数：6

共 50 条

[41] Applications of some formulas for finite Markov chains
Goldberg, Maxim J.
Kim, Seonja
APPLIED AND COMPUTATIONAL HARMONIC ANALYSIS, 2011, 30 (01) : 37 - 46
[42] The expected hitting times for finite Markov chains
Chen, Haiyan
Zhang, Fuji
LINEAR ALGEBRA AND ITS APPLICATIONS, 2008, 428 (11-12) : 2730 - 2749
[43] On hidden Markov chains and finite stochastic systems
Spreij, P
STATISTICS & PROBABILITY LETTERS, 2003, 62 (02) : 189 - 202
[44] Adaptive control of constrained finite Markov chains
Poznyak, AS
Najim, K
AUTOMATICA, 1999, 35 (05) : 777 - 789
[45] Markovian bounds on functions of finite Markov chains
Ledoux, J
Truffet, L
ADVANCES IN APPLIED PROBABILITY, 2001, 33 (02) : 505 - 519
[46] Robust parametric inference for finite Markov chains
Abhik Ghosh
TEST, 2022, 31 : 118 - 147
[47] WEAK LUMPABILITY IN FINITE MARKOV-CHAINS
ABDELMONEIM, AM
LEYSIEFFER, FW
JOURNAL OF APPLIED PROBABILITY, 1982, 19 (03) : 685 - 691
[48] ASYMPTOTICALLY OPTIMAL TESTS FOR FINITE MARKOV CHAINS
BOZA, LB
ANNALS OF MATHEMATICAL STATISTICS, 1970, 41 (03): : 1153 - &
[49] Regular finite Markov chains with interval probabilities
Skulj, Damjan
ISIPTA 07-PROCEEDINGS OF THE FIFTH INTERNATIONAL SYMPOSIUM ON IMPRECISE PROBABILITY:THEORIES AND APPLICATIONS, 2007, : 405 - 413
[50] GENERAL TRANSITION PROBABILITIES FOR FINITE MARKOV CHAINS
NEUTS, MF
PROCEEDINGS OF THE CAMBRIDGE PHILOSOPHICAL SOCIETY, 1964, 60 (01): : 83 - &

← 1 2 3 4 5 →