Solving MDPs using two-timescale simulated annealing with multiplicative weights

被引:0
|
作者
Abdulla, Mohammed Shahid [1 ]
Bhatnagar, Shalabh [1 ]
机构
[1] Indian Inst Sci, Dept Comp Sci & Automat, Bangalore 560012, Karnataka, India
关键词
Markov decision processes; reinforcement learning; two timescale stochastic approximation; Simulated Annealing with Multiplicative Weights;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We develop extensions of the Simulated Annealing with Multiplicative Weights (SAMW) algorithm that proposed a method of solution of Finite-Horizon Markov Decision Processes (FH-MDPs). The extensions developed are in three directions: a) Use of the dynamic programming principle in the policy update step of SAMW b) A two-timescale actor-critic algorithm that uses simulated transitions alone, and c) Extending the algorithm to the infinite-horizon discounted-reward scenario. In particular, a) reduces the storage required from exponential to linear in the number of actions per stage-state pair. On the faster timescale, a 'critic' recursion performs policy evaluation while on the slower timescale an 'actor' recursion performs policy improvement using SAMW. We give a proof outlining convergence w.p. 1 and show experimental results on two settings: semiconductor fabrication and flow control in communication networks.
引用
收藏
页码:2695 / 2700
页数:6
相关论文
共 50 条
  • [11] EFFSAMWMIX : AN EFFICIENT STOCHASTIC MULTI-ARMED BANDIT ALGORITHM BASED ON A SIMULATED ANNEALING WITH MULTIPLICATIVE WEIGHTS
    Villari, Boby Chaitanya
    Abdulla, Mohammed Shahid
    GLOBAL AND NATIONAL BUSINESS THEORIES AND PRACTICE: BRIDGING THE PAST WITH THE FUTURE, 2017, : 1876 - 1890
  • [12] Solving the Course Scheduling Problem Using Simulated Annealing
    Aycan, E.
    Ayav, T.
    2009 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE, VOLS 1-3, 2009, : 462 - 466
  • [13] Solving the flowshop scheduling problem using simulated annealing
    Yang, G. (yanggelan@126.com), 1600, Advanced Institute of Convergence Information Technology, Myoungbo Bldg 3F,, Bumin-dong 1-ga, Seo-gu, Busan, 602-816, Korea, Republic of (04):
  • [14] IRS-Assisted MISO With Finite-Alphabet Inputs Using Two-Timescale CSI
    Xu, Hao
    Zang, Xujie
    Sun, Yunan
    Ouyang, Chongjun
    Yang, Hongwen
    ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 2828 - 2833
  • [15] Characterizing two-timescale nonlinear dynamics using finite-time Lyapunov exponents and subspaces
    Mease, K. D.
    Topcu, U.
    Aykutlug, E.
    Maggie, M.
    COMMUNICATIONS IN NONLINEAR SCIENCE AND NUMERICAL SIMULATION, 2016, 36 : 148 - 174
  • [16] Solving the assignment problem using genetic algorithm and simulated annealing
    Sahu, Anshuman
    Tapadar, Rudrajit
    IMECS 2006: INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, 2006, : 762 - +
  • [17] Solving location and layout problems using GIS and simulated annealing
    Miller, HJ
    GIS/LIS '96 - ANNUAL CONFERENCE AND EXPOSITION PROCEEDINGS, 1996, : 310 - 324
  • [18] Solving terminal allocation problem using simulated annealing arithmetic
    Faculty of Electronic and Information Engineering, Zhejiang Wanli University, No. 8 South Qian Hu Road, Ningbo, Zhejiang Province, China
    不详
    WSEAS Transactions on Systems, 2008, 7 (12): : 1412 - 1422
  • [19] Solving a Multiple Objective Linear Program using simulated annealing
    Sarker, R
    Newton, C
    ASIA-PACIFIC JOURNAL OF OPERATIONAL RESEARCH, 2001, 18 (01) : 109 - 120
  • [20] Solving the medical student scheduling problem using simulated annealing
    Zanazzo, Eugenia
    Ceschia, Sara
    Dovier, Agostino
    Schaerf, Andrea
    JOURNAL OF SCHEDULING, 2024,