ρ-POMDPs have Lipschitz-Continuous ε-Optimal Value Functions

被引:0
|
作者
Fehr, Mathieu [1 ]
Buffett, Olivier [2 ]
Thomas, Vincent [2 ]
Dibangoye, Junes [3 ]
机构
[1] Ecole Normale Super, Rue Ulm, Paris, France
[2] Univ Lorraine, CNRS, INRIA, LORIA, Nancy, France
[3] Univ Lyon, INSA Lyon, INRIA, CITI, Lyon, France
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018) | 2018年 / 31卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many state-of-the-art algorithms for solving Partially Observable Markov Decision Processes (POMDPs) rely on turning the problem into a "fully observable" problem-a belief MDP-and exploiting the piece-wise linearity and convexity (PWLC) of the optimal value function in this new state space (the belief simplex Delta). This approach has been extended to solving rho-POMDPs-i.e., for information-oriented criteria-when the reward rho is convex in Delta. General rho-POMDPs can also be turned into "fully observable" problems, but with no means to exploit the PWLC property. In this paper, we focus on POMDPs and rho-POMDPs with lambda rho-Lipschitz reward function, and demonstrate that, for finite horizons, the optimal value function is Lipschitz-continuous. Then, value function approximators are proposed for both upper- and lower-bounding the optimal value function, which are shown to provide uniformly improvable bounds. This allows proposing two algorithms derived from HSVI which are empirically evaluated on various benchmark problems.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Minimizing Lipschitz-continuous strongly convex functions over integer points in polytopes
    Baes, Michel
    Del Pia, Alberto
    Nesterov, Yurii
    Onn, Shmuel
    Weismantel, Robert
    MATHEMATICAL PROGRAMMING, 2012, 134 (01) : 305 - 322
  • [2] Minimizing Lipschitz-continuous strongly convex functions over integer points in polytopes
    Michel Baes
    Alberto Del Pia
    Yurii Nesterov
    Shmuel Onn
    Robert Weismantel
    Mathematical Programming, 2012, 134 : 305 - 322
  • [3] ModelGuard: Runtime Validation of Lipschitz-continuous Models
    Carpenter, Taylor J.
    Ivanov, Radoslav
    Lee, Insup
    Weimer, James
    IFAC PAPERSONLINE, 2021, 54 (05): : 37 - 42
  • [4] Optimal and approximate Q-value functions for decentralized POMDPs
    Oliehoek, Frans A.
    Spaan, Matthijs T. J.
    Vlassis, Nikos
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2008, 32 : 289 - 353
  • [5] Smoothness of Lipschitz-continuous graphs with nonvanishing Levi curvature
    Citti, G
    Lanconelli, E
    Montanari, A
    ACTA MATHEMATICA, 2002, 188 (01) : 87 - 128
  • [6] Absolute exponential stability of recurrent neural networks with Lipschitz-continuous activation functions and time delays
    Cao, JD
    Wang, J
    NEURAL NETWORKS, 2004, 17 (03) : 379 - 390
  • [7] Direction of Vorticity and Regularity up to the Boundary: On the Lipschitz-Continuous Case
    da Veiga, Hugo Beirao
    JOURNAL OF MATHEMATICAL FLUID MECHANICS, 2013, 15 (01) : 55 - 63
  • [8] Direction of Vorticity and Regularity up to the Boundary: On the Lipschitz-Continuous Case
    Hugo Beirão da Veiga
    Journal of Mathematical Fluid Mechanics, 2013, 15 : 55 - 63
  • [9] Estimating the Average of a Lipschitz-Continuous Function from One Sample
    Das, Abhimanyu
    Kempe, David
    ALGORITHMS-ESA 2010, 2010, 6346 : 219 - 230
  • [10] Lipschitz-continuous local isometric immersions: rigid maps and origami
    Dacorogna, B.
    Marcellini, P.
    Paolini, E.
    JOURNAL DE MATHEMATIQUES PURES ET APPLIQUEES, 2008, 90 (01): : 66 - 81