Sample Path Sharing in Simulation-Based Policy Improvement

被引:0
|
作者
Wu, Di [1 ]
Jia, Qing-Shan [1 ]
Chen, Chun-Hung [2 ,3 ]
机构
[1] Tsinghua Univ, Ctr Intelligent & Networked Syst CFINS, Dept Automat, TNLIST, Beijing 100084, Peoples R China
[2] George Mason Univ, Dept Syst Engn & Operat Res, Fairfax, VA 22030 USA
[3] Natl Taiwan Univ, Taipei, Taiwan
关键词
Discrete event dynamic system; simulation-based optimization; optimal computing budget allocation; STATE AGGREGATION; AVERAGE REWARD; DECISION; OPTIMIZATION; ALLOCATION; SELECTION; PROPERTY;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Simulation-based policy improvement (SBPI) has been widely used to improve given base policies through simulation. The basic idea of SBPI is to estimate all the Q-factors for a given state using simulation, and then select the action that achieves the minimal cost. It is therefore of great importance to efficiently use the given budget in order to select the best action with high probability. Different from existing budget allocation algorithms that estimate Q-factors by independent simulation, we share the sample paths to improve the probability of correctly selecting the best action. Our method can be combined with equal allocation, Successive Rejects, and optimal computing budget allocation to enhance their probabilities of correct selection as well as to achieve better policies in SBPI. Such improvement depends on the overlap in reachable states under different actions. Numerical results show that with such overlap, combining our method with equal allocation, Successive Rejects and optimal computing budget allocation produces higher probability of selection as well as better policies in SBPI.
引用
收藏
页码:3291 / 3296
页数:6
相关论文
共 50 条
  • [1] Efficient Computing Budget Allocation for Simulation-Based Policy Improvement
    Jia, Qing-Shan
    [J]. IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2012, 9 (02) : 342 - 352
  • [2] Simulation-Based Policy Improvement for Energy Management in Commercial Office Buildings
    Jia, Qing-Shan
    Shen, Jian-Xiang
    Xu, Zhan-Bo
    Guan, Xiao-Hong
    [J]. IEEE TRANSACTIONS ON SMART GRID, 2012, 3 (04) : 2211 - 2223
  • [3] Convergence of simulation-based policy iteration
    Cooper, WL
    Henderson, SG
    Lewis, ME
    [J]. PROBABILITY IN THE ENGINEERING AND INFORMATIONAL SCIENCES, 2003, 17 (02) : 213 - 234
  • [4] An Efficient Simulation-Based Policy Improvement with Optimal Computing Budget Allocation Based on Accumulated Samples
    Huang, Xilang
    Choi, Seon Han
    [J]. ELECTRONICS, 2022, 11 (07)
  • [5] Matching EV Charging Load With Uncertain Wind: A Simulation-Based Policy Improvement Approach
    Huang, Qilong
    Jia, Qing-Shan
    Qiu, Zhifeng
    Guan, Xiaohong
    Deconinck, Geert
    [J]. IEEE TRANSACTIONS ON SMART GRID, 2015, 6 (03) : 1425 - 1433
  • [6] SIMULATION-BASED SAMPLE-SIZE ESTIMATE
    BOISSEL, JP
    PEYRIEUX, JC
    [J]. CONTROLLED CLINICAL TRIALS, 1988, 9 (03): : 285 - 285
  • [7] Simulation-based performance improvement for shipbuilding processes
    University of Michigan, Ann Arbor, MI, United States
    不详
    不详
    不详
    [J]. J Ship Prod, 2006, 2 (49-65):
  • [8] The feasibility of sharing simulation-based evaluation scenarios in anesthesiology
    Berkenstadt, H
    Kantor, GS
    Yusim, Y
    Gafni, N
    Perel, A
    Ezri, T
    Ziv, A
    [J]. ANESTHESIA AND ANALGESIA, 2005, 101 (04): : 1068 - 1074
  • [9] Bootstrapping Simulation-Based Algorithms with a Suboptimal Policy
    Truong-Huy Dinh Nguyen
    Silander, Tomi
    Lee, Wee-Sun
    Leong, Tze-Yun
    [J]. TWENTY-FOURTH INTERNATIONAL CONFERENCE ON AUTOMATED PLANNING AND SCHEDULING, 2014, : 181 - 189
  • [10] A Scenario Approach to Robust Simulation-based Path Planning
    Bopardikar, Shaunak D.
    Srivastava, Vaibhav
    [J]. 2022 AMERICAN CONTROL CONFERENCE, ACC, 2022, : 5024 - 5029