A reinforcement learning based algorithm for Markov decision processes

被引:1
|
作者
Bhatnagar, S [1 ]
Kumar, S [1 ]
机构
[1] Inst Ind Sci, Dept Comp Sci & Automat, Bangalore 560012, Karnataka, India
关键词
actor-critic algorithims; two-tiniescale stochastic approximation; Markov decision processes; reinforcement learning; simultaneous perturbation stochastic approximation;
D O I
10.1109/ICISIP.2005.1529448
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A variant of a recently proposed two-tirnescale reinforcement learning based actor-critic algorithm for infinite horizon discounted cost Markov decision processes with finite state and compact action spaces is proposed. On the faster timescale, the value function corresponding to a given stationary deterministic policy is updated and averaged while the policy itself is updated on the slower scale. The latter recursion uses the sign of the gradient estimate instead of the estimate itself. A potential advantage in the use of sign function lies in significantly reduced computation and communication overheads in applications such as congestion control in communication networks and distributed computation. Convergence analysis of the algorithm is briefly sketched and numerical experiments for a problem of congestion control are presented.
引用
收藏
页码:199 / 204
页数:6
相关论文
共 50 条
  • [1] A reinforcement learning based algorithm for finite horizon Markov decision processes
    Bhatnagar, Shalabh
    Abdulla, Mohammed Shahid
    [J]. PROCEEDINGS OF THE 45TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-14, 2006, : 5519 - 5524
  • [2] Reinforcement learning algorithm for partially observable Markov decision processes
    Wang, Xue-Ning
    He, Han-Gen
    Xu, Xin
    [J]. Kongzhi yu Juece/Control and Decision, 2004, 19 (11): : 1263 - 1266
  • [3] An Inverse Reinforcement Learning Algorithm for semi-Markov Decision Processes
    Tan, Chuanfang
    Li, Yanjie
    Cheng, Yuhu
    [J]. 2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 1256 - 1261
  • [4] Reinforcement Learning for Constrained Markov Decision Processes
    Gattami, Ather
    Bai, Qinbo
    Aggarwal, Vaneet
    [J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [5] Reinforcement Learning in Robust Markov Decision Processes
    Lim, Shiau Hong
    Xu, Huan
    Mannor, Shie
    [J]. MATHEMATICS OF OPERATIONS RESEARCH, 2016, 41 (04) : 1325 - 1353
  • [6] A Deep Hierarchical Reinforcement Learning Algorithm in Partially Observable Markov Decision Processes
    Le, Tuyen P.
    Ngo Anh Vien
    Chung, Taechoong
    [J]. IEEE ACCESS, 2018, 6 : 49089 - 49102
  • [7] Kernel-Based Reinforcement Learning in Robust Markov Decision Processes
    Lim, Shiau Hong
    Autef, Arnaud
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [8] Reinforcement learning based algorithms for average cost Markov Decision Processes
    Abdulla, Mohammed Shahid
    Bhatnagar, Shalabh
    [J]. DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS, 2007, 17 (01): : 23 - 52
  • [9] Reinforcement Learning Based Algorithms for Average Cost Markov Decision Processes
    Mohammed Shahid Abdulla
    Shalabh Bhatnagar
    [J]. Discrete Event Dynamic Systems, 2007, 17 : 23 - 52
  • [10] A sensitivity view of Markov decision processes and reinforcement learning
    Cao, XR
    [J]. MODELING, CONTROL AND OPTIMIZATION OF COMPLEX SYSTEMS: IN HONOR OF PROFESSOR YU-CHI HO, 2003, 14 : 261 - 283