Solving MDPs using two-timescale simulated annealing with multiplicative weights

被引:0
|
作者
Abdulla, Mohammed Shahid [1 ]
Bhatnagar, Shalabh [1 ]
机构
[1] Indian Inst Sci, Dept Comp Sci & Automat, Bangalore 560012, Karnataka, India
关键词
Markov decision processes; reinforcement learning; two timescale stochastic approximation; Simulated Annealing with Multiplicative Weights;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We develop extensions of the Simulated Annealing with Multiplicative Weights (SAMW) algorithm that proposed a method of solution of Finite-Horizon Markov Decision Processes (FH-MDPs). The extensions developed are in three directions: a) Use of the dynamic programming principle in the policy update step of SAMW b) A two-timescale actor-critic algorithm that uses simulated transitions alone, and c) Extending the algorithm to the infinite-horizon discounted-reward scenario. In particular, a) reduces the storage required from exponential to linear in the number of actions per stage-state pair. On the faster timescale, a 'critic' recursion performs policy evaluation while on the slower timescale an 'actor' recursion performs policy improvement using SAMW. We give a proof outlining convergence w.p. 1 and show experimental results on two settings: semiconductor fabrication and flow control in communication networks.
引用
收藏
页码:2695 / 2700
页数:6
相关论文
共 50 条
  • [21] Solving the Cubic Cell Formation Problem Using Simulated Annealing
    Bouaziz, Hamida
    Lemouari, Ali
    INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2022, 12 (01)
  • [22] Solving distance problems with concave bodies using simulated annealing
    Carretero, JA
    Nahon, MA
    Ma, O
    IROS 2001: PROCEEDINGS OF THE 2001 IEEE/RJS INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-4: EXPANDING THE SOCIETAL ROLE OF ROBOTICS IN THE NEXT MILLENNIUM, 2001, : 1507 - 1512
  • [24] Tube-Based Robust MPC for Two-Timescale Systems Using Reduced-Order Models
    Wang, Wenqing
    Koeln, Justin P. P.
    IEEE CONTROL SYSTEMS LETTERS, 2023, 7 : 799 - 804
  • [25] Optimization of neural network weights and architectures for odor recognition using simulated annealing
    Yamazaki, A
    de Souto, MCP
    Ludermir, TB
    PROCEEDING OF THE 2002 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-3, 2002, : 547 - 552
  • [26] Solving topology optimization problems using the Modified Simulated Annealing Algorithm
    Millan Paramo, C.
    Begambre Carrillo, O.
    REVISTA INTERNACIONAL DE METODOS NUMERICOS PARA CALCULO Y DISENO EN INGENIERIA, 2016, 32 (02): : 65 - 69
  • [27] Solving combinatorial optimization problems using stochastic chaotic simulated annealing
    Wang, LP
    Li, S
    Tian, FY
    8TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING, VOLS 1-3, PROCEEDING, 2001, : 366 - 371
  • [28] Solving the patient zero inverse problem by using generalized simulated annealing
    Menin, Olavo H.
    Bauch, Chris T.
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2018, 490 : 1513 - 1521
  • [29] Solving the shortest route cut and fill problem using simulated annealing
    Henderson, D
    Vaughan, DE
    Jacobson, SH
    Wakefield, RR
    Sewell, EC
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2003, 145 (01) : 72 - 84
  • [30] Solving the maximum diversity problem using simulated annealing based evolutionary algorithm
    Lin, Geng (lingeng413@163.com), 1769, ICIC International (13):