Solving MDPs using two-timescale simulated annealing with multiplicative weights

被引:0
|
作者
Abdulla, Mohammed Shahid [1 ]
Bhatnagar, Shalabh [1 ]
机构
[1] Indian Inst Sci, Dept Comp Sci & Automat, Bangalore 560012, Karnataka, India
关键词
Markov decision processes; reinforcement learning; two timescale stochastic approximation; Simulated Annealing with Multiplicative Weights;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We develop extensions of the Simulated Annealing with Multiplicative Weights (SAMW) algorithm that proposed a method of solution of Finite-Horizon Markov Decision Processes (FH-MDPs). The extensions developed are in three directions: a) Use of the dynamic programming principle in the policy update step of SAMW b) A two-timescale actor-critic algorithm that uses simulated transitions alone, and c) Extending the algorithm to the infinite-horizon discounted-reward scenario. In particular, a) reduces the storage required from exponential to linear in the number of actions per stage-state pair. On the faster timescale, a 'critic' recursion performs policy evaluation while on the slower timescale an 'actor' recursion performs policy improvement using SAMW. We give a proof outlining convergence w.p. 1 and show experimental results on two settings: semiconductor fabrication and flow control in communication networks.
引用
收藏
页码:2695 / 2700
页数:6
相关论文
共 50 条
  • [31] Solving train formation problem using simulated annealing algorithm in a simplex framework
    Yaghini, Masoud
    Momeni, Mohsen
    Sarmadi, Mohammadreza
    JOURNAL OF ADVANCED TRANSPORTATION, 2014, 48 (05) : 402 - 416
  • [32] Efficiently solving the Traveling Thief Problem using hill climbing and simulated annealing
    El Yafrani, Mohamed
    Ahiod, Belaid
    INFORMATION SCIENCES, 2018, 432 : 231 - 244
  • [33] Solving Combinatorial Optimization Problems Using Augmented Lagrange Chaotic Simulated Annealing
    Wang, Lipo
    Tian, Fuyu
    Soong, Boon Hee
    Wan, Chunru
    DIFFERENTIAL EQUATIONS AND DYNAMICAL SYSTEMS, 2011, 19 (1-2) : 171 - 179
  • [34] Solving channel assignment problems using local search methods and simulated annealing
    Wang, Lipo
    Sally Ng Sa Lee
    Wong Yow Hing
    INDEPENDENT COMPONENT ANALYSES, WAVELETS, NEURAL NETWORKS, BIOSYSTEMS, AND NANOENGINEERING IX, 2011, 8058
  • [35] A novel approach for solving travelling thief problem using enhanced simulated annealing
    Ali, Hamid
    Rafique, Muhammad Zaid
    Sarfraz, Muhammad Shahzad
    Malik, Muhammad Sheraz Arshad
    Alqahtani, Mohammed A.
    Alqurni, Jehad Saad
    PEERJ COMPUTER SCIENCE, 2021, 7 : 1 - 18
  • [36] On solving complex multi-period location models using simulated annealing
    Antunes, A
    Peeters, D
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2001, 130 (01) : 190 - 201
  • [37] Solving a part classification problem using simulated annealing-like hybrid
    Tiwari, MK
    Roy, D
    ROBOTICS AND COMPUTER-INTEGRATED MANUFACTURING, 2003, 19 (05) : 415 - 424
  • [38] A comparison of two methods for solving 0-1 integer programs using a general purpose simulated annealing algorithm
    Abramson, D
    Dang, H
    Krishnamoorthy, M
    ANNALS OF OPERATIONS RESEARCH, 1996, 63 : 129 - 150
  • [39] A Simulated Annealing Algorithm for Solving Two-Echelon Vehicle Routing Problem with Locker Facilities
    Redi, A. A. N. Perwira
    Jewpanya, Parida
    Kurniawan, Adji Candra
    Persada, Satria Fadil
    Nadlifatin, Reny
    Dewi, Oki Anita Candra
    ALGORITHMS, 2020, 13 (09) : 1 - 14
  • [40] Two-level modified simulated annealing based approach for solving facility layout problem
    Singh, S. P.
    Sharma, R. R. K.
    INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2008, 46 (13) : 3563 - 3582