Computing a Classic Index for Finite-Horizon Bandits

被引：21

作者：

Nino-Mora, Jose ^{[1
]}

机构：

[1] Univ Carlos III Madrid, Dept Stat, Madrid 28903, Spain

来源：

INFORMS JOURNAL ON COMPUTING | 2011年 / 23卷 / 02期

关键词：

dynamic programming; Markov; bandits; finite-horizon; index policies; analysis of algorithms; computational complexity;

D O I：

10.1287/ijoc.1100.0398

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

This paper considers the efficient exact computation of the counterpart of the Gittins index for a finite-horizon discrete-state bandit, which measures for each initial state the average productivity, given by the maximum ratio of expected total discounted reward earned to expected total discounted time expended that can be achieved through a number of successive plays stopping by the given horizon. Besides characterizing optimal policies for the finite-horizon one-armed bandit problem, such an index provides a suboptimal heuristic index rule for the intractable finite-horizon multiarmed bandit problem, which represents the natural extension of the Gittins index rule (optimal in the infinite-horizon case). Although such a finite-horizon index was introduced in classic work in the 1950s, investigation of its efficient exact computation has received scant attention. This paper introduces a recursive adaptive-greedy algorithm using only arithmetic operations that computes the index in (pseudo-)polynomial time in the problem parameters (number of project states and time horizon length). In the special case of a project with limited transitions per state, the complexity is either reduced or depends only on the length of the time horizon. The proposed algorithm is benchmarked in a computational study against the conventional calibration method.

引用

页码：254 / 267

页数：14

共 50 条

[1] Reinforcement Learning Augmented Asymptotically Optimal Index Policy for Finite-Horizon Restless Bandits
Xiong, Guojun
Li, Jian
Singh, Rahul
[J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8726 - 8734
[2] On computing the L2-induced norm of finite-horizon systems
Bamieh, B
[J]. 42ND IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-6, PROCEEDINGS, 2003, : 1860 - 1862
[3] A FINITE-HORIZON MONETARY ECONOMY
KULTTI, K
[J]. JOURNAL OF ECONOMIC DYNAMICS & CONTROL, 1995, 19 (1-2): : 237 - 251
[4] THE FINITE-HORIZON WAR OF ATTRITION
CANNINGS, C
WHITTAKER, JC
[J]. GAMES AND ECONOMIC BEHAVIOR, 1995, 11 (02) : 193 - 236
[5] A marginal productivity index policy for the finite-horizon multiarmed bandit problem
Nino-Mora, Jose
[J]. 2005 44th IEEE Conference on Decision and Control & European Control Conference, Vols 1-8, 2005, : 1718 - 1722
[6] Computing L2-gain of finite-horizon systems with boundary conditions
Fujioka, Hisaya
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2007, 52 (04) : 697 - 702
[7] FINITE-HORIZON APPROXIMATES OF INFINITE HORIZON STOCHASTIC PROGRAMS
FLAM, SD
WETS, RJB
[J]. LECTURE NOTES IN CONTROL AND INFORMATION SCIENCES, 1986, 81 : 339 - 350
[8] THE FLEXIBLE ACCELERATOR AND OPTIMIZATION WITH A FINITE-HORIZON
VONZURMUEHLEN, P
[J]. ECONOMICS LETTERS, 1980, 5 (01) : 21 - 27
[9] Finite-horizon equipment replacement analysis
Hartman, JC
Murphy, A
[J]. IIE TRANSACTIONS, 2006, 38 (05) : 409 - 419
[10] A NOTE ON CHECKING SCHEDULES WITH FINITE-HORIZON
VISCOLANI, B
[J]. RAIRO-RECHERCHE OPERATIONNELLE-OPERATIONS RESEARCH, 1991, 25 (02): : 203 - 208

← 1 2 3 4 5 →