Decentralized Learning in Finite Markov Chains: Revisited

被引：6

作者：

Chang, Hyeong Soo ^{[1
]}

机构：

[1] Sogang Univ, Dept Comp Sci & Engn, Seoul 121742, South Korea

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2009年 / 54卷 / 07期

关键词：

Controlled Markov chain; decentralized learning; fictitious play; learning automata; Markov decision process; THEORETIC APPROACH; SIMULATION; HORIZON;

D O I：

10.1109/TAC.2009.2017977

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The convergence proof in the paper "Decentralized learning in finite Markov chains," published in the IEEE Transactions on Automatic Control, vol. AC-31, no. 6, pp. 519-526, 1986, is incomplete. This note first provides a sufficient condition for the existence of a unique optimal policy for infinite-horizon average-cost Markov decision processes (MDPs), making the convergence result established by Wheeler and Narendra preserved with the condition. We then present a novel simulation-based decentralized algorithm, called "sampled joint-strategy fictitious play for MDP" for average MDPs based on the recent study by Garcia et al. of a decentralized approach to discrete optimization via fictitious play applied to games with identical payoffs. We establish a stronger almost-sure convergence result than Wheeler and Narendra's, showing that the sequence of probability distributions over the policy space for a given MDP generated by the algorithm converges to a unique optimal policy with probability one.

引用

页码：1648 / 1653

页数：6

共 50 条

[1] DECENTRALIZED LEARNING IN FINITE MARKOV-CHAINS
WHEELER, RM
NARENDRA, KS
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1986, 31 (06) : 519 - 526
[2] FUNCTIONS OF FINITE MARKOV CHAINS
LEYSIEFF.FW
ANNALS OF MATHEMATICAL STATISTICS, 1965, 36 (03): : 1077 - &
[3] On a result for finite Markov chains
Kulathinal, Sangita
Ghosh, Lagnojita
INTERNATIONAL JOURNAL OF MATHEMATICAL EDUCATION IN SCIENCE AND TECHNOLOGY, 2006, 37 (04) : 498 - 502
[4] FUNCTIONS OF FINITE MARKOV CHAINS
LEYSIEFFER, FW
ANNALS OF MATHEMATICAL STATISTICS, 1967, 38 (01): : 206 - +
[5] On Learning Markov Chains
Hao, Yi
Orlitsky, Alon
Pichapati, Venkatadheeraj
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[6] Decentralized learning in Markov games
Vrancx, Peter
Verbeeck, Katja
Nowe, Ann
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (04): : 976 - 981
[7] Cycloid decompositions of finite Markov chains
Kalpazidou, S
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 1999, 18 (03) : 191 - 204
[8] Unified theory for finite Markov chains
Rhodes, John
Schilling, Anne
ADVANCES IN MATHEMATICS, 2019, 347 : 739 - 779
[9] FUNCTIONS OF FINITE MARKOV-CHAINS
DHARMADHIKARI, SW
ANNALS OF MATHEMATICAL STATISTICS, 1963, 34 (03): : 1022 - &
[10] The cutoff phenomenon in finite Markov chains
Diaconis, P
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1996, 93 (04) : 1659 - 1664

← 1 2 3 4 5 →