Robust control of the multi-armed bandit problem

被引:3
|
作者
Caro, Felipe [1 ]
Das Gupta, Aparupa [1 ]
机构
[1] Univ Calif Los Angeles, Anderson Sch Management, Los Angeles, CA 90095 USA
关键词
Multiarmed bandit; Index policies; Bellman equation; Robust Markov decision processes; Uncertain transition matrix; Project selection; MARKOV DECISION-PROCESSES; OPTIMAL ADAPTIVE POLICIES; ALLOCATION;
D O I
10.1007/s10479-015-1965-7
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
We study a robust model of the multi-armed bandit (MAB) problem in which the transition probabilities are ambiguous and belong to subsets of the probability simplex. We first show that for each arm there exists a robust counterpart of the Gittins index that is the solution to a robust optimal stopping-time problem and can be computed effectively with an equivalent restart problem. We then characterize the optimal policy of the robust MAB as a project-by-project retirement policy but we show that arms become dependent so the policy based on the robust Gittins index is not optimal. For a project selection problem, we show that the robust Gittins index policy is near optimal but its implementation requires more computational effort than solving a non-robust MAB problem. Hence, we propose a Lagrangian index policy that requires the same computational effort as evaluating the indices of a non-robust MAB and is within 1 % of the optimum in the robust project selection problem.
引用
收藏
页码:461 / 480
页数:20
相关论文
共 50 条
  • [41] Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates
    Yang, YH
    Zhu, D
    ANNALS OF STATISTICS, 2002, 30 (01): : 100 - 121
  • [42] The Restless Multi-Armed Bandit Formulation of the Cognitive Compressive Sensing Problem
    Bagheri, Saeed
    Scaglione, Anna
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2015, 63 (05) : 1183 - 1198
  • [43] THE MYOPIC SOLUTION OF THE MULTI-ARMED BANDIT COMPRESSIVE SPECTRUM SENSING PROBLEM
    Bagheri, Saeed
    Scaglione, Anna
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [44] A non-parametric solution to the multi-armed bandit problem with covariates
    Ai, Mingyao
    Huang, Yimin
    Yu, Jun
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2021, 211 : 402 - 413
  • [45] Heterogeneous Stochastic Interactions for Multiple Agents in a Multi-armed Bandit Problem
    Madhushani, Udari
    Leonard, Naomi Ehrich
    2019 18TH EUROPEAN CONTROL CONFERENCE (ECC), 2019, : 3502 - 3507
  • [46] Dynamic Multi-Armed Bandit Algorithm for the Cyclic Bandwidth Sum Problem
    Rodriguez-Tello, Eduardo
    Narvaez-Teran, Valentina
    Lardeux, Frederic
    IEEE ACCESS, 2019, 7 : 40258 - 40270
  • [47] Multi-armed Bandit Formulation of the Task Partitioning Problem in Swarm Robotics
    Pini, Giovanni
    Brutschy, Arne
    Francesca, Gianpiero
    Dorigo, Marco
    Birattari, Mauro
    SWARM INTELLIGENCE (ANTS 2012), 2012, 7461 : 109 - 120
  • [48] Lower bounds on the sample complexity of exploration in the multi-armed bandit problem
    Mannor, S
    Tsitsiklis, JN
    LEARNING THEORY AND KERNEL MACHINES, 2003, 2777 : 418 - 432
  • [49] Multi-armed bandit algorithms and empirical evaluation
    Vermorel, J
    Mohri, M
    MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 437 - 448
  • [50] Sustainable Cooperative Coevolution with a Multi-Armed Bandit
    De Rainville, Francois-Michel
    Sebag, Michele
    Gagne, Christian
    Schoenauer, Marc
    Laurendeau, Denis
    GECCO'13: PROCEEDINGS OF THE 2013 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2013, : 1517 - 1524