Simulation-based optimization of Markov decision processes: An empirical process theory approach

被引:8
|
作者
Jain, Rahul [1 ,2 ]
Varaiya, Pravin [3 ]
机构
[1] Univ So Calif, Dept Elect Engn, Los Angeles, CA 90089 USA
[2] Univ So Calif, ISE Dept, Los Angeles, CA 90089 USA
[3] Univ Calif Berkeley, Dept EECS, Berkeley, CA 94720 USA
基金
美国国家科学基金会;
关键词
Markov decision processes; Learning algorithms; Monte Carlo simulation; Stochastic Control; Optimization; UNIFORM-CONVERGENCE;
D O I
10.1016/j.automatica.2010.05.021
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We generalize and build on the PAC Learning framework for Markov Decision Processes developed in Jain and Varaiya (2006). We consider the reward function to depend on both the state and the action. Both the state and action spaces can potentially be countably infinite. We obtain an estimate for the value function of a Markov decision process, which assigns to each policy its expected discounted reward. This expected reward can be estimated as the empirical average of the reward over many independent simulation runs. We derive bounds on the number of runs needed for the convergence of the empirical average to the expected reward uniformly for a class of policies, in terms of the V-C or pseudo dimension of the policy class. We then propose a framework to obtain an epsilon-optimal policy from simulation. We provide sample complexity of such an approach. (C) 2010 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1297 / 1304
页数:8
相关论文
共 50 条
  • [1] PAC bounds for simulation-based optimization of Markov decision processes
    Jain, Rahul
    Varaiya, Pravin P.
    [J]. PROCEEDINGS OF THE 46TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-14, 2007, : 6389 - +
  • [2] Simulation-based optimization of Markov reward processes
    Marbach, Peter
    Tsitsiklis, John N.
    [J]. Proceedings of the IEEE Conference on Decision and Control, 1998, 3 : 2698 - 2703
  • [3] Simulation-based optimization of Markov reward processes
    Marbach, P
    Tsitsiklis, JN
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2001, 46 (02) : 191 - 209
  • [4] Simulation-based optimization of Markov reward processes
    Marbach, P
    Tsitsiklis, JN
    [J]. PROCEEDINGS OF THE 37TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-4, 1998, : 2698 - 2703
  • [5] Simulation-Based Optimization Algorithms for Finite-Horizon Markov Decision Processes
    Bhatnagar, Shalabh
    Abdulla, Mohammed Shahid
    [J]. SIMULATION-TRANSACTIONS OF THE SOCIETY FOR MODELING AND SIMULATION INTERNATIONAL, 2008, 84 (12): : 577 - 600
  • [6] A Simulation-based Approach for Solving Generalized Semi-Markov Decision Processes
    Rachelson, Emmanuel
    Quesnel, Gauthier
    Garcia, Frederick
    Fabiani, Patrick
    [J]. ECAI 2008, PROCEEDINGS, 2008, 178 : 583 - +
  • [7] A SURVEY OF SOME SIMULATION-BASED ALGORITHMS FOR MARKOV DECISION PROCESSES
    Chang, Hyeong Soo
    Fu, Michael C.
    Hu, Jiaqiao
    Marcus, Steven I.
    [J]. COMMUNICATIONS IN INFORMATION AND SYSTEMS, 2007, 7 (01) : 59 - 92
  • [8] SIMULATION-BASED OPTIMIZATION OF MARKOV CONTROLLED PROCESSES WITH UNKNOWN PARAMETERS
    Campos-Nanez, Enrique
    [J]. 23RD EUROPEAN CONFERENCE ON MODELLING AND SIMULATION (ECMS 2009), 2009, : 537 - 543
  • [9] Simulation-based uniform value function estimates of Markov decision processes
    Jain, Rahul
    Varaiya, Pravin P.
    [J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2006, 45 (05) : 1633 - 1656
  • [10] A simulation-based approach for decision-support in healthcare processes
    Ruiz, Mercedes
    Orta, Elena
    Sanchez, Juan
    [J]. SIMULATION MODELLING PRACTICE AND THEORY, 2024, 136