Partially Observable Markov Decision Processes and Performance Sensitivity Analysis

被引:15
|
作者
Li, Yanjie [1 ]
Yin, Baoqun [1 ]
Xi, Hongsheng [1 ]
机构
[1] Univ Sci & Technol China, Dept Automat, Hefei 230026, Peoples R China
基金
中国国家自然科学基金;
关键词
Finite-state controller (FSC); gradient estimation; partially observable Markov decision processes (POMDPs); policy iteration; sensitivity analysis;
D O I
10.1109/TSMCB.2008.927711
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The sensitivity-based optimization of Markov systems has become an increasingly important area. From the perspective of performance sensitivity analysis, policy-iteration algorithms and gradient estimation methods can be directly obtained for Markov decision processes (MDPs). In this correspondence, the sensitivity-based optimization is extended to average reward partially observable MDPs (POMDPs). We derive the performance-difference and performance-derivative formulas of POMDPs. On the basis of the performance-derivative formula, we present a new method to estimate the performance gradients. From the performance-difference formula, we obtain a sufficient optimality condition without the discounted reward formulation. We also propose a policy-iteration algorithm to obtain a nearly optimal finite-state-controller policy.
引用
收藏
页码:1645 / 1651
页数:7
相关论文
共 50 条
  • [1] Qualitative Analysis of Partially-Observable Markov Decision Processes
    Chatterjee, Krishnendu
    Doyen, Laurent
    Henzinger, Thomas A.
    [J]. MATHEMATICAL FOUNDATIONS OF COMPUTER SCIENCE 2010, 2010, 6281 : 258 - 269
  • [2] Partially Observable Markov Decision Processes: A Geometric Technique and Analysis
    Zhang, Hao
    [J]. OPERATIONS RESEARCH, 2010, 58 (01) : 214 - 228
  • [3] Partially Observable Markov Decision Processes and Robotics
    Kurniawati, Hanna
    [J]. ANNUAL REVIEW OF CONTROL ROBOTICS AND AUTONOMOUS SYSTEMS, 2022, 5 : 253 - 277
  • [4] A tutorial on partially observable Markov decision processes
    Littman, Michael L.
    [J]. JOURNAL OF MATHEMATICAL PSYCHOLOGY, 2009, 53 (03) : 119 - 125
  • [5] Quantum partially observable Markov decision processes
    Barry, Jennifer
    Barry, Daniel T.
    Aaronson, Scott
    [J]. PHYSICAL REVIEW A, 2014, 90 (03):
  • [6] PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES WITH PARTIALLY OBSERVABLE RANDOM DISCOUNT FACTORS
    Martinez-Garcia, E. Everardo
    Minjarez-Sosa, J. Adolfo
    Vega-Amaya, Oscar
    [J]. KYBERNETIKA, 2022, 58 (06) : 960 - 983
  • [7] Active learning in partially observable Markov decision processes
    Jaulmes, R
    Pineau, J
    Precup, D
    [J]. MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 601 - 608
  • [8] Nonapproximability results for partially observable Markov decision processes
    Lusena, C
    Goldsmith, J
    Mundhenk, M
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2001, 14 : 83 - 113
  • [9] Entropy Maximization for Partially Observable Markov Decision Processes
    Savas, Yagiz
    Hibbard, Michael
    Wu, Bo
    Tanaka, Takashi
    Topcu, Ufuk
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (12) : 6948 - 6955
  • [10] Structural Estimation of Partially Observable Markov Decision Processes
    Chang, Yanling
    Garcia, Alfredo
    Wang, Zhide
    Sun, Lu
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (08) : 5135 - 5141