Algebraic optimization of sequential decision problems

被引:0
|
作者
Dressler, Mareike [1 ]
Garrote-Lopez, Marina [4 ]
Montufar, Guido [2 ,3 ,4 ]
Mueller, Johannes [4 ]
Rose, Kemal [4 ]
机构
[1] Univ New South Wales, Sch Math & Stat, Sydney, NSW 2052, Australia
[2] Univ Calif Los Angeles, Dept Math, Los Angeles, CA 90095 USA
[3] Univ Calif Los Angeles, Dept Stat, Los Angeles, CA 90095 USA
[4] Max Planck Inst Math Sci, D-04103 Leipzig, SN, Germany
基金
欧洲研究理事会;
关键词
Partially observable Markov decision process; Algebraic degree; Polynomial optimization; State aggregation; State -action frequencies;
D O I
10.1016/j.jsc.2023.102241
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We study the optimization of the expected long-term reward in finite partially observable Markov decision processes over the set of stationary stochastic policies. In the case of deterministic observations, also known as state aggregation, the problem is equivalent to optimizing a linear objective subject to quadratic constraints. We characterize the feasible set of this problem as the intersection of a product of affine varieties of rank one matrices and a polytope. Based on this description, we obtain bounds on the number of critical points of the optimization problem. Finally, we conduct experiments in which we solve the KKT equations or the Lagrange equations over different boundary components of the feasible set, and we compare the result to the theoretical bounds and to other constrained optimization methods.(c) 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] Decision Problems for Finite Automata over Infinite Algebraic Structures
    Khoussainov, Bakhadyr
    Liu, Jiamou
    IMPLEMENTATION AND APPLICATION OF AUTOMATA, 2016, 9705 : 3 - 11
  • [42] Influence of modeling structure in probabilistic sequential decision problems
    Teichteil-Koenigsbuch, Florent
    Fabiani, Patrick
    RAIRO-OPERATIONS RESEARCH, 2006, 40 (02) : 195 - 234
  • [43] SEQUENTIAL DECISION PROBLEMS - A MODEL TO EXPLOIT EXISTING FORECASTERS
    HAUSMAN, WH
    OPERATIONS RESEARCH, 1966, S 14 : B60 - &
  • [44] Non-Markovian policies in sequential decision problems
    Szepesvari, Csaba
    Acta Cybernetica, 1998, 13 (03): : 305 - 318
  • [45] Support Vector Machine Classifiers for Sequential Decision Problems
    Diaz, Eladio Rodriguez
    Castanon, David A.
    PROCEEDINGS OF THE 48TH IEEE CONFERENCE ON DECISION AND CONTROL, 2009 HELD JOINTLY WITH THE 2009 28TH CHINESE CONTROL CONFERENCE (CDC/CCC 2009), 2009, : 2558 - 2563
  • [46] A NEW CLASS OF SEQUENTIAL DECISION RULES FOR SYMMETRIC PROBLEMS
    HALL, WJ
    ANNALS OF MATHEMATICAL STATISTICS, 1960, 31 (02): : 524 - 525
  • [47] Zero-Shot Assistance in Sequential Decision Problems
    De Peuter, Sebastiaan
    Kaski, Samuel
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 10, 2023, : 11551 - 11559
  • [48] The time inconsistency of decisions in pharmacoeconomic sequential decision problems
    Jakubczyk, MK
    Kowalik, E
    Niewada, MP
    VALUE IN HEALTH, 2003, 6 (06) : 801 - 801
  • [49] Uni[MASK]: Unified Inference in Sequential Decision Problems
    Carroll, Micah
    Paradise, Orr
    Lin, Jessy
    Georgescu, Raluca
    Sun, Mingfei
    Bignell, David
    Milani, Stephanie
    Hofmann, Katja
    Hausknecht, Matthew
    Dragan, Anca
    Devlin, Sam
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [50] A Conjugate Class of Utility Functions for Sequential Decision Problems
    Houlding, Brett
    Coolen, Frank P. A.
    Bolger, Donnacha
    RISK ANALYSIS, 2015, 35 (09) : 1611 - 1622