Efficient Algorithms for Budget-Constrained Markov Decision Processes

被引：3

作者：

Caramanis, Constantine ^{[1
]}

Dimitrov, Nedialko B. ^{[2
]}

Morton, David P. ^{[3
]}

机构：

[1] Univ Texas Austin, Dept Elect & Comp Engn, Austin, TX 78712 USA

[2] Naval Postgrad Sch, Dept Operat Res, Monterey, CA 93943 USA

[3] Univ Texas Austin, Grad Program Operat Res, Austin, TX 78712 USA

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2014年 / 59卷 / 10期

基金：

美国国家科学基金会;

关键词：

Markov decision processes (MDPs);

D O I：

10.1109/TAC.2014.2314211

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Discounted, discrete-time, discrete state-space, discrete action-space Markov decision processes (MDPs) form a classical topic in control, game theory, and learning, and as a result are widely applied, increasingly, in very large-scale applications. Many algorithms have been developed to solve large-scale MDPs. Algorithms based on value iteration are particularly popular, as they are more efficient than the generic linear programming approach, by an order of magnitude in the number of states of the MDP. Yet in the case of budget constrained MDPs, no more efficient algorithm than linear programming is known. The theoretically slower running times of linear programming may limit the scalability of constrained MDPs piratically; while, theoretically, it invites the question of whether the increase is somehow intrinsic. In this technical note we show that it is not, and provide two algorithms for budget-constrained MDPs that are as efficient as value iteration. Denoting the running time of value iteration by VI, and the magnitude of the input by U, for an MDP with m expected budget constraints our first algorithm runs in time O(poly(m, log U).VI). Given a pre-specified degree of precision,., for satisfying the budget constraints, our second algorithm runs in time O(logm center dot poly(log U).(1/eta(2)) center dot VI), but may produce solutions that overutilize each of the m budgets by a multiplicative factor of 1 + eta. In fact, one can substitute value iteration with any algorithm, possibly specially designed for a specific MDP, that solves the MDP quickly to achieve similar theoretical guarantees. Both algorithms restrict attention to constrained infinite-horizon MDPs under discounted costs.

引用

页码：2813 / 2817

页数：5

共 50 条

[1] Approximation algorithms for budget-constrained auctions
Garg, R
Kumar, V
Pandit, V
[J]. APPROXIMATION, RANDOMIZATION, AND COMBINATORIAL OPTIMIZATION: ALGORITHMS AND TECHNIQUES, 2001, 2129 : 102 - 113
[2] Algorithms for budget-constrained survivable topology design
Garg, N
Simha, R
Xing, WX
[J]. 2002 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, VOLS 1-5, CONFERENCE PROCEEDINGS, 2002, : 2162 - 2166
[3] Efficient algorithms for Risk-Sensitive Markov Decision Processes with limited budget
Melo Moreira, Daniel A.
Delgado, Karina Valdivia
de Barros, Leliane Nunes
Maua, Denis Deratani
[J]. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2021, 139 : 143 - 165
[4] BUDGET-CONSTRAINED PARETO-EFFICIENT ALLOCATIONS
BALASKO, Y
[J]. JOURNAL OF ECONOMIC THEORY, 1979, 21 (03) : 359 - 379
[5] THE EXISTENCE OF BUDGET-CONSTRAINED PARETO-EFFICIENT ALLOCATIONS
SVENSSON, LG
[J]. JOURNAL OF ECONOMIC THEORY, 1984, 32 (02) : 346 - 350
[6] Budget-constrained search
Manning, R
Manning, JRA
[J]. EUROPEAN ECONOMIC REVIEW, 1997, 41 (09) : 1817 - 1834
[7] BUDGET-CONSTRAINED PARETO EFFICIENT ALLOCATIONS - A DYNAMIC STORY
BALASKO, Y
[J]. JOURNAL OF ECONOMIC THEORY, 1982, 27 (01) : 239 - 242
[8] Constrained Multiagent Markov Decision Processes: a Taxonomy of Problems and Algorithms
de Nijs, Frits
Walraven, Erwin
de Weerdt, Mathijs M.
Spaan, Matthijs T. J.
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2021, 70 : 955 - 1001
[9] Learning algorithms for finite horizon constrained markov decision processes
Mittal, A.
Hemachandra, N.
[J]. JOURNAL OF INDUSTRIAL AND MANAGEMENT OPTIMIZATION, 2007, 3 (03) : 429 - 444
[10] Constrained multiagent Markov decision processes: A taxonomy of problems and algorithms
de Nijs, Frits
Walraven, Erwin
de Weerdt, Mathijs M.
Spaan, Matthijs T.J.
[J]. Journal of Artificial Intelligence Research, 2021, 70 : 955 - 1001

← 1 2 3 4 5 →