The multi-armed bandit, with constraints

被引:0
|
作者
Eric V. Denardo
Eugene A. Feinberg
Uriel G. Rothblum
机构
[1] Yale University,Center for Systems Sciences
[2] Stony Brook University,Department of Applied Mathematics and Statistics
[3] Technion—Israel Institute of Technology,Late of the Faculty of Industrial Engineering and Management
来源
关键词
Optimal Policy; Column Generation; Priority Rule; Initial Randomization; Bandit Problem;
D O I
暂无
中图分类号
学科分类号
摘要
Presented in this paper is a self-contained analysis of a Markov decision problem that is known as the multi-armed bandit. The analysis covers the cases of linear and exponential utility functions. The optimal policy is shown to have a simple and easily-implemented form. Procedures for computing such a policy are presented, as are procedures for computing the expected utility that it earns, given any starting state. For the case of linear utility, constraints that link the bandits are introduced, and the constrained optimization problem is solved via column generation. The methodology is novel in several respects, which include the use of elementary row operations to simplify arguments.
引用
收藏
页码:37 / 62
页数:25
相关论文
共 50 条
  • [31] Multi-armed Bandit Problems with Strategic Arms
    Braverman, Mark
    Mao, Jieming
    Schneider, Jon
    Weinberg, S. Matthew
    CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
  • [32] Noise Free Multi-armed Bandit Game
    Nakamura, Atsuyoshi
    Helmbold, David P.
    Warmuth, Manfred K.
    LANGUAGE AND AUTOMATA THEORY AND APPLICATIONS, LATA 2016, 2016, 9618 : 412 - 423
  • [33] Robust control of the multi-armed bandit problem
    Felipe Caro
    Aparupa Das Gupta
    Annals of Operations Research, 2022, 317 : 461 - 480
  • [34] CHARACTERIZING TRUTHFUL MULTI-ARMED BANDIT MECHANISMS
    Babaioff, Moshe
    Sharma, Yogeshwer
    Slivkins, Aleksandrs
    SIAM JOURNAL ON COMPUTING, 2014, 43 (01) : 194 - 230
  • [35] Multi-Armed Recommender System Bandit Ensembles
    Canamares, Rocio
    Redondo, Marcos
    Castells, Pablo
    RECSYS 2019: 13TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, 2019, : 432 - 436
  • [36] Multi-armed bandit problem with known trend
    Bouneffouf, Djallel
    Feraud, Raphael
    NEUROCOMPUTING, 2016, 205 : 16 - 21
  • [37] A Multi-Armed Bandit Hyper-Heuristic
    Ferreira, Alexandre Silvestre
    Goncalves, Richard Aderbal
    Ramirez Pozo, Aurora Trinidad
    2015 BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2015), 2015, : 13 - 18
  • [38] Ambiguity aversion in multi-armed bandit problems
    Christopher M. Anderson
    Theory and Decision, 2012, 72 : 15 - 33
  • [39] Variational inference for the multi-armed contextual bandit
    Urteaga, Inigo
    Wiggins, Chris H.
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [40] An Incentive-Compatible Multi-Armed Bandit Mechanism
    Gonen, Rica
    Pavlov, Elan
    PODC'07: PROCEEDINGS OF THE 26TH ANNUAL ACM SYMPOSIUM ON PRINCIPLES OF DISTRIBUTED COMPUTING, 2007, : 362 - 363