An Online Actor-Critic Algorithm with Function Approximation for Constrained Markov Decision Processes

被引:46
|
作者
Bhatnagar, Shalabh [1 ]
Lakshmanan, K. [1 ]
机构
[1] Indian Inst Sci, Dept Comp Sci & Automat, Bangalore 560012, Karnataka, India
关键词
Actor-critic algorithm; Constrained Markov decision processes; Long-run average cost criterion; Function approximation; STOCHASTIC-APPROXIMATION;
D O I
10.1007/s10957-012-9989-5
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
We develop an online actor-critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multi-stage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point.
引用
收藏
页码:688 / 708
页数:21
相关论文
共 50 条