Decentralized Learning in Finite Markov Chains: Revisited

被引:6
|
作者
Chang, Hyeong Soo [1 ]
机构
[1] Sogang Univ, Dept Comp Sci & Engn, Seoul 121742, South Korea
关键词
Controlled Markov chain; decentralized learning; fictitious play; learning automata; Markov decision process; THEORETIC APPROACH; SIMULATION; HORIZON;
D O I
10.1109/TAC.2009.2017977
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The convergence proof in the paper "Decentralized learning in finite Markov chains," published in the IEEE Transactions on Automatic Control, vol. AC-31, no. 6, pp. 519-526, 1986, is incomplete. This note first provides a sufficient condition for the existence of a unique optimal policy for infinite-horizon average-cost Markov decision processes (MDPs), making the convergence result established by Wheeler and Narendra preserved with the condition. We then present a novel simulation-based decentralized algorithm, called "sampled joint-strategy fictitious play for MDP" for average MDPs based on the recent study by Garcia et al. of a decentralized approach to discrete optimization via fictitious play applied to games with identical payoffs. We establish a stronger almost-sure convergence result than Wheeler and Narendra's, showing that the sequence of probability distributions over the policy space for a given MDP generated by the algorithm converges to a unique optimal policy with probability one.
引用
收藏
页码:1648 / 1653
页数:6
相关论文
共 50 条