A reinforcement learning based algorithm for Markov decision processes

被引：1

作者：

Bhatnagar, S ^{[1
]}

Kumar, S ^{[1
]}

机构：

[1] Inst Ind Sci, Dept Comp Sci & Automat, Bangalore 560012, Karnataka, India

来源：

2005 International Conference on Intelligent Sensing and Information Processing, Proceedings | 2005年

关键词：

actor-critic algorithims; two-tiniescale stochastic approximation; Markov decision processes; reinforcement learning; simultaneous perturbation stochastic approximation;

D O I：

10.1109/ICISIP.2005.1529448

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A variant of a recently proposed two-tirnescale reinforcement learning based actor-critic algorithm for infinite horizon discounted cost Markov decision processes with finite state and compact action spaces is proposed. On the faster timescale, the value function corresponding to a given stationary deterministic policy is updated and averaged while the policy itself is updated on the slower scale. The latter recursion uses the sign of the gradient estimate instead of the estimate itself. A potential advantage in the use of sign function lies in significantly reduced computation and communication overheads in applications such as congestion control in communication networks and distributed computation. Convergence analysis of the algorithm is briefly sketched and numerical experiments for a problem of congestion control are presented.

引用

页码：199 / 204

页数：6

共 50 条

[1] A reinforcement learning based algorithm for finite horizon Markov decision processes
Bhatnagar, Shalabh
Abdulla, Mohammed Shahid
[J]. PROCEEDINGS OF THE 45TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-14, 2006, : 5519 - 5524
[2] Reinforcement learning algorithm for partially observable Markov decision processes
Wang, Xue-Ning
He, Han-Gen
Xu, Xin
[J]. Kongzhi yu Juece/Control and Decision, 2004, 19 (11): : 1263 - 1266
[3] An Inverse Reinforcement Learning Algorithm for semi-Markov Decision Processes
Tan, Chuanfang
Li, Yanjie
Cheng, Yuhu
[J]. 2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 1256 - 1261
[4] Reinforcement Learning for Constrained Markov Decision Processes
Gattami, Ather
Bai, Qinbo
Aggarwal, Vaneet
[J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[5] Reinforcement Learning in Robust Markov Decision Processes
Lim, Shiau Hong
Xu, Huan
Mannor, Shie
[J]. MATHEMATICS OF OPERATIONS RESEARCH, 2016, 41 (04) : 1325 - 1353
[6] A Deep Hierarchical Reinforcement Learning Algorithm in Partially Observable Markov Decision Processes
Le, Tuyen P.
Ngo Anh Vien
Chung, Taechoong
[J]. IEEE ACCESS, 2018, 6 : 49089 - 49102
[7] Kernel-Based Reinforcement Learning in Robust Markov Decision Processes
Lim, Shiau Hong
Autef, Arnaud
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[8] Reinforcement learning based algorithms for average cost Markov Decision Processes
Abdulla, Mohammed Shahid
Bhatnagar, Shalabh
[J]. DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS, 2007, 17 (01): : 23 - 52
[9] Reinforcement Learning Based Algorithms for Average Cost Markov Decision Processes
Mohammed Shahid Abdulla
Shalabh Bhatnagar
[J]. Discrete Event Dynamic Systems, 2007, 17 : 23 - 52
[10] A sensitivity view of Markov decision processes and reinforcement learning
Cao, XR
[J]. MODELING, CONTROL AND OPTIMIZATION OF COMPLEX SYSTEMS: IN HONOR OF PROFESSOR YU-CHI HO, 2003, 14 : 261 - 283

← 1 2 3 4 5 →