Single sample path-based optimization of Markov chains

被引:31
|
作者
Cao, XR [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Clear Water Bay, Kowloon, Peoples R China
关键词
perturbation analysis; on-line optimization; Markov decision processes; performance potentials;
D O I
10.1023/A:1022634422482
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
Motivated by the needs of on-line optimization of real-world engineering systems, we studied single sample path-based algorithms for Markov decision problems (MDP). The sample path used in the algorithms can be obtained by observing the operation of a real system. We give a simple example to explain the advantages of the sample path-based approach over the traditional computation-based approach: matrix inversion is not required; some transition probabilities do not have to be known; it may save storage space; and it gives the flexibility of iterating the actions for a subset of the state space in each iteration. The effect of the estimation errors and the convergence property of the sample path-based approach are studied. Finally, we propose a fast algorithm, which updates the policy whenever the system reaches a particular set of states and prove that the algorithm converges to the true optimal policy with probability one under some conditions. The sample path-based approach may have important applications to the design and management of engineering systems, such as high speed communication networks.
引用
收藏
页码:527 / 548
页数:22
相关论文
共 50 条
  • [1] Single Sample Path-Based Optimization of Markov Chains
    X. R. Cao
    [J]. Journal of Optimization Theory and Applications, 1999, 100 : 527 - 548
  • [2] A single sample path-based performance sensitivity formula for Markov chains
    Cao, XR
    Yuan, XM
    Qiu, L
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1996, 41 (12) : 1814 - 1817
  • [3] Single sample path-based sensitivity analysis of Markov processes using uniformization
    Liu, ZK
    Tu, FS
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1999, 44 (04) : 872 - 875
  • [4] Single sample path based optimization of Markov systems: Examples and algorithms
    Cao, XR
    [J]. PROCEEDINGS OF THE 36TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-5, 1997, : 668 - 673
  • [5] On-line optimization algorithm for Markov control processes based on a single sample path
    Tang, Hao
    Xi, Hong-Sheng
    Yin, Bao-Qun
    [J]. Kongzhi Lilun Yu Yinyong/Control Theory and Applications, 2002, 19 (06):
  • [6] MIXING TIME ESTIMATION IN REVERSIBLE MARKOV CHAINS FROM A SINGLE SAMPLE PATH
    Hsu, Daniel
    Kontorovich, Aryeh
    Levin, David A.
    Peres, Yuval
    Szepesvari, Csaba
    Wolfer, Geoffrey
    [J]. ANNALS OF APPLIED PROBABILITY, 2019, 29 (04): : 2439 - 2480
  • [7] Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path
    Hsu, Daniel
    Kontorovich, Aryeh
    Szepesvari, Csaba
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [8] Sample path optimality for a Markov optimization problem
    Hunt, FY
    [J]. STOCHASTIC PROCESSES AND THEIR APPLICATIONS, 2005, 115 (05) : 769 - 779
  • [9] A PATH-BASED APPROACH TO CONSTRAINED SPARSE OPTIMIZATION
    Hallak, Nadav
    [J]. SIAM JOURNAL ON OPTIMIZATION, 2024, 34 (01) : 790 - 816
  • [10] Recursive approaches for single sample path based Markov reward processes
    Fang, H.T.
    Chen, H.F.
    Cao, X.R.
    [J]. Asian Journal of Control, 2001, 3 (01) : 21 - 26