A simultaneous perturbation Stochastic approximation-based actor-critic algorithm for Markov decision processes

被引：21

作者：

Bhatnagar, S ^{[1
]}

Kumar, S ^{[1
]}

机构：

[1] Indian Inst Sci, Dept Comp Sci & Automat, Bangalore 560012, Karnataka, India

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2004年 / 49卷 / 04期

关键词：

actor-critic algorithms; Markov decision processes; simultaneous perturbation stochastic approximation (SPSA); two timescale stochastic approximation;

D O I：

10.1109/TAC.2004.825622

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A two-timescale simulation-based actor-critic algorithm for solution of infinite horizon Markov decision processes with finite state and compact action spaces under the discounted cost criterion is proposed. The algorithm does gradient search on the slower timescale in the space of deterministic policies and uses simultaneous perturbation stochastic approximation-based estimates. On the faster scale, the value function corresponding to A given stationary policy is updated and averaged over a fixed number of epochs (for enhanced performance). The proof of convergence to a locally optimal policy is presented. Finally, numerical experiments using the proposed algorithm on flow control in a bottleneck link using a continuous time queueing model are shown.

引用

页码：592 / 598

页数：7

共 50 条

[1] An actor-critic algorithm for constrained Markov decision processes
Borkar, VS
[J]. SYSTEMS & CONTROL LETTERS, 2005, 54 (03) : 207 - 213
[2] An Online Actor-Critic Algorithm with Function Approximation for Constrained Markov Decision Processes
Bhatnagar, Shalabh
Lakshmanan, K.
[J]. JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2012, 153 (03) : 688 - 708
[3] An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes
Bhatnagar, Shalabh
[J]. SYSTEMS & CONTROL LETTERS, 2010, 59 (12) : 760 - 766
[4] Actor-critic algorithms for hierarchical Markov decision processes
Bhatnagar, S
Panigrahi, JR
[J]. AUTOMATICA, 2006, 42 (04) : 637 - 644
[5] An Online Actor–Critic Algorithm with Function Approximation for Constrained Markov Decision Processes
Shalabh Bhatnagar
K. Lakshmanan
[J]. Journal of Optimization Theory and Applications, 2012, 153 : 688 - 708
[6] Improved Simultaneous Perturbation Stochastic Approximation-based Consensus Algorithm for Tracking*
Erofeeva, Victoria
Granichin, Oleg
[J]. 2023 31ST MEDITERRANEAN CONFERENCE ON CONTROL AND AUTOMATION, MED, 2023, : 850 - 855
[7] The actor-critic algorithm as multi-time-scale stochastic approximation
Vivek S Borkar
Vijaymohan R Konda
[J]. Sadhana, 1997, 22 : 525 - 543
[8] The actor-critic algorithm as multi-time-scale stochastic approximation
Borkar, VS
Konda, VR
[J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 1997, 22 (4): : 525 - 543
[9] A simultaneous deterministic perturbation actor-critic algorithm with an application to optimal mortgage refinancing
Chinthalapati, V. L. Raju
Bhatnagar, S.
[J]. PROCEEDINGS OF THE 45TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-14, 2006, : 4151 - 4156
[10] Consolidated actor-critic model for partially-observable Markov decision processes
Elhanany, I.
Niedzwiedz, C.
Liu, Z.
Livingston, S.
[J]. ELECTRONICS LETTERS, 2008, 44 (22) : 1317 - U41

← 1 2 3 4 5 →