ρ-POMDPs have Lipschitz-Continuous ε-Optimal Value Functions

被引：0

作者：

Fehr, Mathieu ^{[1
]}

Buffett, Olivier ^{[2
]}

Thomas, Vincent ^{[2
]}

Dibangoye, Junes ^{[3
]}

机构：

[1] Ecole Normale Super, Rue Ulm, Paris, France

[2] Univ Lorraine, CNRS, INRIA, LORIA, Nancy, France

[3] Univ Lyon, INSA Lyon, INRIA, CITI, Lyon, France

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018) | 2018年 / 31卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Many state-of-the-art algorithms for solving Partially Observable Markov Decision Processes (POMDPs) rely on turning the problem into a "fully observable" problem-a belief MDP-and exploiting the piece-wise linearity and convexity (PWLC) of the optimal value function in this new state space (the belief simplex Delta). This approach has been extended to solving rho-POMDPs-i.e., for information-oriented criteria-when the reward rho is convex in Delta. General rho-POMDPs can also be turned into "fully observable" problems, but with no means to exploit the PWLC property. In this paper, we focus on POMDPs and rho-POMDPs with lambda rho-Lipschitz reward function, and demonstrate that, for finite horizons, the optimal value function is Lipschitz-continuous. Then, value function approximators are proposed for both upper- and lower-bounding the optimal value function, which are shown to provide uniformly improvable bounds. This allows proposing two algorithms derived from HSVI which are empirically evaluated on various benchmark problems.

引用

页数：11

共 50 条

[1] Minimizing Lipschitz-continuous strongly convex functions over integer points in polytopes
Baes, Michel
Del Pia, Alberto
Nesterov, Yurii
Onn, Shmuel
Weismantel, Robert
MATHEMATICAL PROGRAMMING, 2012, 134 (01) : 305 - 322
[2] Minimizing Lipschitz-continuous strongly convex functions over integer points in polytopes
Michel Baes
Alberto Del Pia
Yurii Nesterov
Shmuel Onn
Robert Weismantel
Mathematical Programming, 2012, 134 : 305 - 322
[3] ModelGuard: Runtime Validation of Lipschitz-continuous Models
Carpenter, Taylor J.
Ivanov, Radoslav
Lee, Insup
Weimer, James
IFAC PAPERSONLINE, 2021, 54 (05): : 37 - 42
[4] Optimal and approximate Q-value functions for decentralized POMDPs
Oliehoek, Frans A.
Spaan, Matthijs T. J.
Vlassis, Nikos
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2008, 32 : 289 - 353
[5] Smoothness of Lipschitz-continuous graphs with nonvanishing Levi curvature
Citti, G
Lanconelli, E
Montanari, A
ACTA MATHEMATICA, 2002, 188 (01) : 87 - 128
[6] Absolute exponential stability of recurrent neural networks with Lipschitz-continuous activation functions and time delays
Cao, JD
Wang, J
NEURAL NETWORKS, 2004, 17 (03) : 379 - 390
[7] Direction of Vorticity and Regularity up to the Boundary: On the Lipschitz-Continuous Case
da Veiga, Hugo Beirao
JOURNAL OF MATHEMATICAL FLUID MECHANICS, 2013, 15 (01) : 55 - 63
[8] Direction of Vorticity and Regularity up to the Boundary: On the Lipschitz-Continuous Case
Hugo Beirão da Veiga
Journal of Mathematical Fluid Mechanics, 2013, 15 : 55 - 63
[9] Estimating the Average of a Lipschitz-Continuous Function from One Sample
Das, Abhimanyu
Kempe, David
ALGORITHMS-ESA 2010, 2010, 6346 : 219 - 230
[10] Lipschitz-continuous local isometric immersions: rigid maps and origami
Dacorogna, B.
Marcellini, P.
Paolini, E.
JOURNAL DE MATHEMATIQUES PURES ET APPLIQUEES, 2008, 90 (01): : 66 - 81

← 1 2 3 4 5 →