Solving MDPs using two-timescale simulated annealing with multiplicative weights

被引:0
|
作者
Abdulla, Mohammed Shahid [1 ]
Bhatnagar, Shalabh [1 ]
机构
[1] Indian Inst Sci, Dept Comp Sci & Automat, Bangalore 560012, Karnataka, India
关键词
Markov decision processes; reinforcement learning; two timescale stochastic approximation; Simulated Annealing with Multiplicative Weights;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We develop extensions of the Simulated Annealing with Multiplicative Weights (SAMW) algorithm that proposed a method of solution of Finite-Horizon Markov Decision Processes (FH-MDPs). The extensions developed are in three directions: a) Use of the dynamic programming principle in the policy update step of SAMW b) A two-timescale actor-critic algorithm that uses simulated transitions alone, and c) Extending the algorithm to the infinite-horizon discounted-reward scenario. In particular, a) reduces the storage required from exponential to linear in the number of actions per stage-state pair. On the faster timescale, a 'critic' recursion performs policy evaluation while on the slower timescale an 'actor' recursion performs policy improvement using SAMW. We give a proof outlining convergence w.p. 1 and show experimental results on two settings: semiconductor fabrication and flow control in communication networks.
引用
收藏
页码:2695 / 2700
页数:6
相关论文
共 50 条
  • [1] Global Convergence of Two-Timescale Actor-Critic for Solving Linear Quadratic Regulator
    Chen, Xuyang
    Duan, Jingliang
    Liang, Yingbin
    Zhao, Lin
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 7087 - 7095
  • [2] Experimentation with Benders decomposition for solving the two-timescale stochastic generation capacity expansion problem
    Vojvodic, Goran
    Novoa, Luis J.
    Jarrah, Ahmad I.
    EURO JOURNAL ON COMPUTATIONAL OPTIMIZATION, 2023, 11
  • [3] Two-Timescale Voltage Regulation in Distribution Grids Using Deep Reinforcement Learning
    Yang, Qiuling
    Wang, Gang
    Sadeghi, Alireza
    Giannakis, Georgios B.
    Sun, Jian
    2019 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CONTROL, AND COMPUTING TECHNOLOGIES FOR SMART GRIDS (SMARTGRIDCOMM), 2019,
  • [4] CFD Prediction of Partload CO Emissions using a Two-Timescale Combustion Model
    Wegner, Bernhard
    Gruschka, Uwe
    Krebs, Werner
    Egorov, Y.
    Forkel, H.
    Ferreira, J.
    Aschmoneit, Kai
    PROCEEDINGS OF THE ASME TURBO EXPO 2010, VOL 2, PTS A AND B, 2010, : 103 - 112
  • [5] Two-Timescale Voltage Control in Distribution Grids Using Deep Reinforcement Learning
    Sun, Jian (sunjian@bit.edu.cn), 1600, Institute of Electrical and Electronics Engineers Inc., United States (11):
  • [6] CFD Prediction of Partload CO Emissions Using a Two-Timescale Combustion Model
    Wegner, Bernhard
    Gruschka, Uwe
    Krebs, Werner
    Egorov, Y.
    Forkel, H.
    Ferreira, J.
    Aschmoneit, Kai
    JOURNAL OF ENGINEERING FOR GAS TURBINES AND POWER-TRANSACTIONS OF THE ASME, 2011, 133 (07):
  • [7] Two-timescale learning using idiotypic behaviour mediation for a navigating mobile robot
    Whitbrook, Amanda M.
    Aickelin, Uwe
    Garibaldi, Jonathan M.
    APPLIED SOFT COMPUTING, 2010, 10 (03) : 876 - 887
  • [8] Two-Timescale Voltage Control in Distribution Grids Using Deep Reinforcement Learning
    Yang, Qiuling
    Wang, Gang
    Sadeghi, Alireza
    Giannakis, Georgios B.
    Sun, Jian
    IEEE TRANSACTIONS ON SMART GRID, 2020, 11 (03) : 2313 - 2323
  • [9] Six-degree-of-freedom trajectory optimization using a two-timescale collocation architecture
    Desai, Prasun N.
    Conway, Bruce A.
    JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2008, 31 (05) : 1308 - 1315
  • [10] Optimal structured feedback policies for ABR flow control using two-timescale SPSA
    Bhatnagar, S
    Fu, MC
    Marcus, SI
    Fard, PJ
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2001, 9 (04) : 479 - 491