An Online Actor-Critic Algorithm with Function Approximation for Constrained Markov Decision Processes

被引:46
|
作者
Bhatnagar, Shalabh [1 ]
Lakshmanan, K. [1 ]
机构
[1] Indian Inst Sci, Dept Comp Sci & Automat, Bangalore 560012, Karnataka, India
关键词
Actor-critic algorithm; Constrained Markov decision processes; Long-run average cost criterion; Function approximation; STOCHASTIC-APPROXIMATION;
D O I
10.1007/s10957-012-9989-5
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
We develop an online actor-critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multi-stage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point.
引用
收藏
页码:688 / 708
页数:21
相关论文
共 50 条
  • [21] Interconnection and damping assignment control based on modified actor-critic algorithm with wavelet function approximation
    Gheibi, Amir
    Ghiasi, Amir Rikhtehgar
    Ghaemi, Sehraneh
    Badamchizadeh, Mohammad Ali
    [J]. ISA TRANSACTIONS, 2020, 101 : 116 - 129
  • [22] An Actor-Critic Algorithm for SVM Hyperparameters
    Kim, Chayoung
    Park, Jung-min
    Kim, Hye-young
    [J]. INFORMATION SCIENCE AND APPLICATIONS 2018, ICISA 2018, 2019, 514 : 653 - 661
  • [23] An Online Actor-Critic Learning Approach with Levenberg-Marquardt Algorithm
    Ni, Zhen
    He, Haibo
    Prokhorov, Danil V.
    Fu, Jian
    [J]. 2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2011, : 2333 - 2340
  • [24] On the sample complexity of actor-critic method for reinforcement learning with function approximation
    Kumar, Harshat
    Koppel, Alec
    Ribeiro, Alejandro
    [J]. MACHINE LEARNING, 2023, 112 (07) : 2433 - 2467
  • [25] On the sample complexity of actor-critic method for reinforcement learning with function approximation
    Harshat Kumar
    Alec Koppel
    Alejandro Ribeiro
    [J]. Machine Learning, 2023, 112 : 2433 - 2467
  • [26] Convergence of Decentralized Actor-Critic Algorithm in General-Sum Markov Games
    University of California at Berkeley, Department of EECS, Berkeley
    CA
    94709, United States
    不详
    CA
    94709, United States
    [J]. IEEE Control Syst. Lett., 2024, (2643-2648):
  • [27] Actor-Critic Algorithms with Online Feature Adaptation
    Prabuchandran, K. J.
    Bhatnagar, Shalabh
    Borkar, Vivek S.
    [J]. ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION, 2016, 26 (04):
  • [28] A Finite Sample Analysis of the Actor-Critic Algorithm
    Yang, Zhuoran
    Zhang, Kaiqing
    Hong, Mingyi
    Basar, Tamer
    [J]. 2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 2759 - 2764
  • [29] Actor-Critic Algorithm with Transition Cost Estimation
    Sergey, Denisov
    Lee, Jee-Hyong
    [J]. INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS, 2016, 16 (04) : 270 - 275
  • [30] The Effect of Discounting Actor-loss in Actor-Critic Algorithm
    Yaputra, Jordi
    Suyanto, Suyanto
    [J]. 2021 4TH INTERNATIONAL SEMINAR ON RESEARCH OF INFORMATION TECHNOLOGY AND INTELLIGENT SYSTEMS (ISRITI 2021), 2020,