Solving MDPs using two-timescale simulated annealing with multiplicative weights

被引：0

作者：

Abdulla, Mohammed Shahid ^{[1
]}

Bhatnagar, Shalabh ^{[1
]}

机构：

[1] Indian Inst Sci, Dept Comp Sci & Automat, Bangalore 560012, Karnataka, India

来源：

2007 AMERICAN CONTROL CONFERENCE, VOLS 1-13 | 2007年

关键词：

Markov decision processes; reinforcement learning; two timescale stochastic approximation; Simulated Annealing with Multiplicative Weights;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We develop extensions of the Simulated Annealing with Multiplicative Weights (SAMW) algorithm that proposed a method of solution of Finite-Horizon Markov Decision Processes (FH-MDPs). The extensions developed are in three directions: a) Use of the dynamic programming principle in the policy update step of SAMW b) A two-timescale actor-critic algorithm that uses simulated transitions alone, and c) Extending the algorithm to the infinite-horizon discounted-reward scenario. In particular, a) reduces the storage required from exponential to linear in the number of actions per stage-state pair. On the faster timescale, a 'critic' recursion performs policy evaluation while on the slower timescale an 'actor' recursion performs policy improvement using SAMW. We give a proof outlining convergence w.p. 1 and show experimental results on two settings: semiconductor fabrication and flow control in communication networks.

引用

页码：2695 / 2700

页数：6

共 50 条

[1] Global Convergence of Two-Timescale Actor-Critic for Solving Linear Quadratic Regulator
Chen, Xuyang
Duan, Jingliang
Liang, Yingbin
Zhao, Lin
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 7087 - 7095
[2] Experimentation with Benders decomposition for solving the two-timescale stochastic generation capacity expansion problem
Vojvodic, Goran
Novoa, Luis J.
Jarrah, Ahmad I.
EURO JOURNAL ON COMPUTATIONAL OPTIMIZATION, 2023, 11
[3] Two-Timescale Voltage Regulation in Distribution Grids Using Deep Reinforcement Learning
Yang, Qiuling
Wang, Gang
Sadeghi, Alireza
Giannakis, Georgios B.
Sun, Jian
2019 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CONTROL, AND COMPUTING TECHNOLOGIES FOR SMART GRIDS (SMARTGRIDCOMM), 2019,
[4] CFD Prediction of Partload CO Emissions using a Two-Timescale Combustion Model
Wegner, Bernhard
Gruschka, Uwe
Krebs, Werner
Egorov, Y.
Forkel, H.
Ferreira, J.
Aschmoneit, Kai
PROCEEDINGS OF THE ASME TURBO EXPO 2010, VOL 2, PTS A AND B, 2010, : 103 - 112
[5] Two-Timescale Voltage Control in Distribution Grids Using Deep Reinforcement Learning
Sun, Jian (sunjian@bit.edu.cn), 1600, Institute of Electrical and Electronics Engineers Inc., United States (11):
[6] CFD Prediction of Partload CO Emissions Using a Two-Timescale Combustion Model
Wegner, Bernhard
Gruschka, Uwe
Krebs, Werner
Egorov, Y.
Forkel, H.
Ferreira, J.
Aschmoneit, Kai
JOURNAL OF ENGINEERING FOR GAS TURBINES AND POWER-TRANSACTIONS OF THE ASME, 2011, 133 (07):
[7] Two-timescale learning using idiotypic behaviour mediation for a navigating mobile robot
Whitbrook, Amanda M.
Aickelin, Uwe
Garibaldi, Jonathan M.
APPLIED SOFT COMPUTING, 2010, 10 (03) : 876 - 887
[8] Two-Timescale Voltage Control in Distribution Grids Using Deep Reinforcement Learning
Yang, Qiuling
Wang, Gang
Sadeghi, Alireza
Giannakis, Georgios B.
Sun, Jian
IEEE TRANSACTIONS ON SMART GRID, 2020, 11 (03) : 2313 - 2323
[9] Six-degree-of-freedom trajectory optimization using a two-timescale collocation architecture
Desai, Prasun N.
Conway, Bruce A.
JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2008, 31 (05) : 1308 - 1315
[10] Optimal structured feedback policies for ABR flow control using two-timescale SPSA
Bhatnagar, S
Fu, MC
Marcus, SI
Fard, PJ
IEEE-ACM TRANSACTIONS ON NETWORKING, 2001, 9 (04) : 479 - 491

← 1 2 3 4 5 →