Solving MDPs using two-timescale simulated annealing with multiplicative weights

被引:0
|
作者
Abdulla, Mohammed Shahid [1 ]
Bhatnagar, Shalabh [1 ]
机构
[1] Indian Inst Sci, Dept Comp Sci & Automat, Bangalore 560012, Karnataka, India
关键词
Markov decision processes; reinforcement learning; two timescale stochastic approximation; Simulated Annealing with Multiplicative Weights;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We develop extensions of the Simulated Annealing with Multiplicative Weights (SAMW) algorithm that proposed a method of solution of Finite-Horizon Markov Decision Processes (FH-MDPs). The extensions developed are in three directions: a) Use of the dynamic programming principle in the policy update step of SAMW b) A two-timescale actor-critic algorithm that uses simulated transitions alone, and c) Extending the algorithm to the infinite-horizon discounted-reward scenario. In particular, a) reduces the storage required from exponential to linear in the number of actions per stage-state pair. On the faster timescale, a 'critic' recursion performs policy evaluation while on the slower timescale an 'actor' recursion performs policy improvement using SAMW. We give a proof outlining convergence w.p. 1 and show experimental results on two settings: semiconductor fabrication and flow control in communication networks.
引用
收藏
页码:2695 / 2700
页数:6
相关论文
共 50 条
  • [41] Solving the Multi Compartment Vehicle Routing Problem using a Hybridized Simulated Annealing Algorithm
    Beneich C.
    Douiri S.M.
    International Journal of Applied and Computational Mathematics, 2023, 9 (6)
  • [42] Solving Quadratic Assignment Problem in Parallel Using Local Search with Simulated Annealing Elements
    Kovac, Marian
    2013 INTERNATIONAL CONFERENCE ON DIGITAL TECHNOLOGIES (DT), 2013, : 18 - 20
  • [43] Solving binary cutting stock with matheuristics using particle swarm optimization and simulated annealing
    Lopez Sanchez, Ivan Adrian
    Mora Vargas, Jaime
    Santos, Cipriano A.
    Gonzalez Mendoza, Miguel
    Montiel Moctezuma, Cesar L.
    SOFT COMPUTING, 2018, 22 (18) : 6111 - 6119
  • [44] Solving binary cutting stock with matheuristics using particle swarm optimization and simulated annealing
    Ivan Adrian Lopez Sanchez
    Jaime Mora Vargas
    Cipriano A. Santos
    Miguel Gonzalez Mendoza
    Cesar J. Montiel Moctezuma
    Soft Computing, 2018, 22 : 6111 - 6119
  • [45] Solving the team orienteering problem using effective multi-start simulated annealing
    Lin, Shih-Wei
    APPLIED SOFT COMPUTING, 2013, 13 (02) : 1064 - 1073
  • [46] Solving the WDM network operation problem using dynamic synchronous parallel simulated annealing
    Khan, A
    Thompson, DR
    PROCEEDINGS OF THE IEEE SOUTHEASTCON 2004: EXCELLENCE IN ENGINEERING, SCIENCE, AND TECHNOLOGY, 2005, : 296 - 301
  • [47] Optimizing a discrete switching pattern using two simulated annealing algorithms
    Bina, MT
    Hamill, DC
    COMPEL 2000: 7TH WORKSHOP ON COMPUTERS IN POWER ELECTRONICS, PROCEEDINGS, 2000, : 129 - 133
  • [48] Two-Dimensional Inversion of Full Waveforms Using Simulated Annealing
    Tran, Khiem T.
    Hiltunen, Dennis R.
    JOURNAL OF GEOTECHNICAL AND GEOENVIRONMENTAL ENGINEERING, 2012, 138 (09) : 1075 - 1090
  • [49] Two-dimensional equilibrium constraint layout using simulated annealing
    Liu, Jingfa
    Li, Gang
    Chen, Duanbing
    Liu, Wenjie
    Wang, Yali
    COMPUTERS & INDUSTRIAL ENGINEERING, 2010, 59 (04) : 530 - 536
  • [50] Solving the p-hub Median Problem Under Intentional Disruptions Using Simulated Annealing
    F. Parvaresh
    S. A. Hashemi Golpayegany
    S. M. Moattar Husseini
    B. Karimi
    Networks and Spatial Economics, 2013, 13 : 445 - 470