An Online Actor-Critic Algorithm with Function Approximation for Constrained Markov Decision Processes

被引：46

作者：

Bhatnagar, Shalabh ^{[1
]}

Lakshmanan, K. ^{[1
]}

机构：

[1] Indian Inst Sci, Dept Comp Sci & Automat, Bangalore 560012, Karnataka, India

来源：

JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS | 2012年 / 153卷 / 03期

关键词：

Actor-critic algorithm; Constrained Markov decision processes; Long-run average cost criterion; Function approximation; STOCHASTIC-APPROXIMATION;

D O I：

10.1007/s10957-012-9989-5

中图分类号：

C93 [管理学]; O22 [运筹学];

学科分类号：

070105 ; 12 ; 1201 ; 1202 ; 120202 ;

摘要：

We develop an online actor-critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multi-stage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point.

引用

页码：688 / 708

页数：21

共 50 条

[21] Interconnection and damping assignment control based on modified actor-critic algorithm with wavelet function approximation
Gheibi, Amir
Ghiasi, Amir Rikhtehgar
Ghaemi, Sehraneh
Badamchizadeh, Mohammad Ali
[J]. ISA TRANSACTIONS, 2020, 101 : 116 - 129
[22] An Actor-Critic Algorithm for SVM Hyperparameters
Kim, Chayoung
Park, Jung-min
Kim, Hye-young
[J]. INFORMATION SCIENCE AND APPLICATIONS 2018, ICISA 2018, 2019, 514 : 653 - 661
[23] An Online Actor-Critic Learning Approach with Levenberg-Marquardt Algorithm
Ni, Zhen
He, Haibo
Prokhorov, Danil V.
Fu, Jian
[J]. 2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2011, : 2333 - 2340
[24] On the sample complexity of actor-critic method for reinforcement learning with function approximation
Kumar, Harshat
Koppel, Alec
Ribeiro, Alejandro
[J]. MACHINE LEARNING, 2023, 112 (07) : 2433 - 2467
[25] On the sample complexity of actor-critic method for reinforcement learning with function approximation
Harshat Kumar
Alec Koppel
Alejandro Ribeiro
[J]. Machine Learning, 2023, 112 : 2433 - 2467
[26] Convergence of Decentralized Actor-Critic Algorithm in General-Sum Markov Games
University of California at Berkeley, Department of EECS, Berkeley
CA
94709, United States
不详
CA
94709, United States
[J]. IEEE Control Syst. Lett., 2024, (2643-2648):
[27] Actor-Critic Algorithms with Online Feature Adaptation
Prabuchandran, K. J.
Bhatnagar, Shalabh
Borkar, Vivek S.
[J]. ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION, 2016, 26 (04):
[28] A Finite Sample Analysis of the Actor-Critic Algorithm
Yang, Zhuoran
Zhang, Kaiqing
Hong, Mingyi
Basar, Tamer
[J]. 2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 2759 - 2764
[29] Actor-Critic Algorithm with Transition Cost Estimation
Sergey, Denisov
Lee, Jee-Hyong
[J]. INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS, 2016, 16 (04) : 270 - 275
[30] The Effect of Discounting Actor-loss in Actor-Critic Algorithm
Yaputra, Jordi
Suyanto, Suyanto
[J]. 2021 4TH INTERNATIONAL SEMINAR ON RESEARCH OF INFORMATION TECHNOLOGY AND INTELLIGENT SYSTEMS (ISRITI 2021), 2020,

← 1 2 3 4 5 →