Computing a Classic Index for Finite-Horizon Bandits

被引:21
|
作者
Nino-Mora, Jose [1 ]
机构
[1] Univ Carlos III Madrid, Dept Stat, Madrid 28903, Spain
关键词
dynamic programming; Markov; bandits; finite-horizon; index policies; analysis of algorithms; computational complexity;
D O I
10.1287/ijoc.1100.0398
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper considers the efficient exact computation of the counterpart of the Gittins index for a finite-horizon discrete-state bandit, which measures for each initial state the average productivity, given by the maximum ratio of expected total discounted reward earned to expected total discounted time expended that can be achieved through a number of successive plays stopping by the given horizon. Besides characterizing optimal policies for the finite-horizon one-armed bandit problem, such an index provides a suboptimal heuristic index rule for the intractable finite-horizon multiarmed bandit problem, which represents the natural extension of the Gittins index rule (optimal in the infinite-horizon case). Although such a finite-horizon index was introduced in classic work in the 1950s, investigation of its efficient exact computation has received scant attention. This paper introduces a recursive adaptive-greedy algorithm using only arithmetic operations that computes the index in (pseudo-)polynomial time in the problem parameters (number of project states and time horizon length). In the special case of a project with limited transitions per state, the complexity is either reduced or depends only on the length of the time horizon. The proposed algorithm is benchmarked in a computational study against the conventional calibration method.
引用
收藏
页码:254 / 267
页数:14
相关论文
共 50 条
  • [1] Reinforcement Learning Augmented Asymptotically Optimal Index Policy for Finite-Horizon Restless Bandits
    Xiong, Guojun
    Li, Jian
    Singh, Rahul
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8726 - 8734
  • [2] On computing the L2-induced norm of finite-horizon systems
    Bamieh, B
    [J]. 42ND IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-6, PROCEEDINGS, 2003, : 1860 - 1862
  • [3] A FINITE-HORIZON MONETARY ECONOMY
    KULTTI, K
    [J]. JOURNAL OF ECONOMIC DYNAMICS & CONTROL, 1995, 19 (1-2): : 237 - 251
  • [4] THE FINITE-HORIZON WAR OF ATTRITION
    CANNINGS, C
    WHITTAKER, JC
    [J]. GAMES AND ECONOMIC BEHAVIOR, 1995, 11 (02) : 193 - 236
  • [5] A marginal productivity index policy for the finite-horizon multiarmed bandit problem
    Nino-Mora, Jose
    [J]. 2005 44th IEEE Conference on Decision and Control & European Control Conference, Vols 1-8, 2005, : 1718 - 1722
  • [6] Computing L2-gain of finite-horizon systems with boundary conditions
    Fujioka, Hisaya
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2007, 52 (04) : 697 - 702
  • [7] FINITE-HORIZON APPROXIMATES OF INFINITE HORIZON STOCHASTIC PROGRAMS
    FLAM, SD
    WETS, RJB
    [J]. LECTURE NOTES IN CONTROL AND INFORMATION SCIENCES, 1986, 81 : 339 - 350
  • [8] THE FLEXIBLE ACCELERATOR AND OPTIMIZATION WITH A FINITE-HORIZON
    VONZURMUEHLEN, P
    [J]. ECONOMICS LETTERS, 1980, 5 (01) : 21 - 27
  • [9] Finite-horizon equipment replacement analysis
    Hartman, JC
    Murphy, A
    [J]. IIE TRANSACTIONS, 2006, 38 (05) : 409 - 419
  • [10] A NOTE ON CHECKING SCHEDULES WITH FINITE-HORIZON
    VISCOLANI, B
    [J]. RAIRO-RECHERCHE OPERATIONNELLE-OPERATIONS RESEARCH, 1991, 25 (02): : 203 - 208