Robust control of the multi-armed bandit problem

被引:3
|
作者
Caro, Felipe [1 ]
Das Gupta, Aparupa [1 ]
机构
[1] Univ Calif Los Angeles, Anderson Sch Management, Los Angeles, CA 90095 USA
关键词
Multiarmed bandit; Index policies; Bellman equation; Robust Markov decision processes; Uncertain transition matrix; Project selection; MARKOV DECISION-PROCESSES; OPTIMAL ADAPTIVE POLICIES; ALLOCATION;
D O I
10.1007/s10479-015-1965-7
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
We study a robust model of the multi-armed bandit (MAB) problem in which the transition probabilities are ambiguous and belong to subsets of the probability simplex. We first show that for each arm there exists a robust counterpart of the Gittins index that is the solution to a robust optimal stopping-time problem and can be computed effectively with an equivalent restart problem. We then characterize the optimal policy of the robust MAB as a project-by-project retirement policy but we show that arms become dependent so the policy based on the robust Gittins index is not optimal. For a project selection problem, we show that the robust Gittins index policy is near optimal but its implementation requires more computational effort than solving a non-robust MAB problem. Hence, we propose a Lagrangian index policy that requires the same computational effort as evaluating the indices of a non-robust MAB and is within 1 % of the optimum in the robust project selection problem.
引用
收藏
页码:461 / 480
页数:20
相关论文
共 50 条
  • [1] Robust control of the multi-armed bandit problem
    Felipe Caro
    Aparupa Das Gupta
    Annals of Operations Research, 2022, 317 : 461 - 480
  • [2] The budgeted multi-armed bandit problem
    Madani, O
    Lizotte, DJ
    Greiner, R
    LEARNING THEORY, PROCEEDINGS, 2004, 3120 : 643 - 645
  • [3] THE MULTI-ARMED BANDIT PROBLEM WITH COVARIATES
    Perchet, Vianney
    Rigollet, Philippe
    ANNALS OF STATISTICS, 2013, 41 (02): : 693 - 721
  • [4] Robust Trajectory Selection for Rearrangement Planning as a Multi-Armed Bandit Problem
    Koval, Michael C.
    King, Jennifer E.
    Pollard, Nancy S.
    Srinivasa, Siddhartha S.
    2015 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2015, : 2678 - 2685
  • [5] ON MULTI-ARMED BANDIT PROBLEM WITH NUISANCE PARAMETER
    孙嘉阳
    Science China Mathematics, 1986, (05) : 464 - 475
  • [6] ON MULTI-ARMED BANDIT PROBLEM WITH NUISANCE PARAMETER
    孙嘉阳
    ScienceinChina,SerA., 1986, Ser.A.1986 (05) : 464 - 475
  • [7] An Adaptive Algorithm in Multi-Armed Bandit Problem
    Zhang X.
    Zhou Q.
    Liang B.
    Xu J.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2019, 56 (03): : 643 - 654
  • [8] Multi-armed bandit problem with known trend
    Bouneffouf, Djallel
    Feraud, Raphael
    NEUROCOMPUTING, 2016, 205 : 16 - 21
  • [9] Achieving Fairness in the Stochastic Multi-Armed Bandit Problem
    Patil, Vishakha
    Ghalme, Ganesh
    Nair, Vineet
    Narahari, Y.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22
  • [10] Adaptive Active Learning as a Multi-armed Bandit Problem
    Czarnecki, Wojciech M.
    Podolak, Igor T.
    21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 989 - 990