Robust control of the multi-armed bandit problem

被引:3
|
作者
Caro, Felipe [1 ]
Das Gupta, Aparupa [1 ]
机构
[1] Univ Calif Los Angeles, Anderson Sch Management, Los Angeles, CA 90095 USA
关键词
Multiarmed bandit; Index policies; Bellman equation; Robust Markov decision processes; Uncertain transition matrix; Project selection; MARKOV DECISION-PROCESSES; OPTIMAL ADAPTIVE POLICIES; ALLOCATION;
D O I
10.1007/s10479-015-1965-7
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
We study a robust model of the multi-armed bandit (MAB) problem in which the transition probabilities are ambiguous and belong to subsets of the probability simplex. We first show that for each arm there exists a robust counterpart of the Gittins index that is the solution to a robust optimal stopping-time problem and can be computed effectively with an equivalent restart problem. We then characterize the optimal policy of the robust MAB as a project-by-project retirement policy but we show that arms become dependent so the policy based on the robust Gittins index is not optimal. For a project selection problem, we show that the robust Gittins index policy is near optimal but its implementation requires more computational effort than solving a non-robust MAB problem. Hence, we propose a Lagrangian index policy that requires the same computational effort as evaluating the indices of a non-robust MAB and is within 1 % of the optimum in the robust project selection problem.
引用
收藏
页码:461 / 480
页数:20
相关论文
共 50 条
  • [21] Multi-armed bandit games
    Gursoy, Kemal
    ANNALS OF OPERATIONS RESEARCH, 2024,
  • [22] The multi-armed bandit, with constraints
    Denardo, Eric V.
    Feinberg, Eugene A.
    Rothblum, Uriel G.
    ANNALS OF OPERATIONS RESEARCH, 2013, 208 (01) : 37 - 62
  • [23] The Assistive Multi-Armed Bandit
    Chan, Lawrence
    Hadfield-Menell, Dylan
    Srinivasa, Siddhartha
    Dragan, Anca
    HRI '19: 2019 14TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2019, : 354 - 363
  • [24] The non-stationary stochastic multi-armed bandit problem
    Allesiardo R.
    Féraud R.
    Maillard O.-A.
    Allesiardo, Robin (robin.allesiardo@gmail.com), 1600, Springer Science and Business Media Deutschland GmbH (03): : 267 - 283
  • [25] Multi-armed bandit problem with online clustering as side information
    Dzhoha, Andrii
    Rozora, Iryna
    JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2023, 427
  • [26] SOM-based Algorithm for Multi-armed Bandit Problem
    Manome, Nobuhito
    Shinohara, Shuji
    Suzuki, Kouta
    Tomonaga, Kosuke
    Mitsuyoshi, Shunji
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [27] Tug-of-War Model for Multi-armed Bandit Problem
    Kim, Song-Ju
    Aono, Masashi
    Hara, Masahiko
    UNCONVENTIONAL COMPUTATION, PROCEEDINGS, 2010, 6079 : 69 - +
  • [28] DYNAMIC ALLOCATION INDEX FOR THE DISCOUNTED MULTI-ARMED BANDIT PROBLEM
    GITTINS, JC
    JONES, DM
    BIOMETRIKA, 1979, 66 (03) : 561 - 565
  • [29] Robust sequential design for piecewise-stationary multi-armed bandit problem in the presence of outliers
    Wang, Yaping
    Peng, Zhicheng
    Zhang, Riquan
    Xiao, Qian
    STATISTICAL THEORY AND RELATED FIELDS, 2021, 5 (02) : 122 - 133
  • [30] Dynamic Multi-Armed Bandit with Covariates
    Pavlidis, Nicos G.
    Tasoulis, Dimitris K.
    Adams, Niall M.
    Hand, David J.
    ECAI 2008, PROCEEDINGS, 2008, 178 : 777 - +