The multi-armed bandit, with constraints

被引：0

作者：

Eric V. Denardo

Eugene A. Feinberg

Uriel G. Rothblum

机构：

[1] Yale University,Center for Systems Sciences

[2] Stony Brook University,Department of Applied Mathematics and Statistics

[3] Technion—Israel Institute of Technology,Late of the Faculty of Industrial Engineering and Management

来源：

Annals of Operations Research | 2013年 / 208卷

关键词：

Optimal Policy; Column Generation; Priority Rule; Initial Randomization; Bandit Problem;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Presented in this paper is a self-contained analysis of a Markov decision problem that is known as the multi-armed bandit. The analysis covers the cases of linear and exponential utility functions. The optimal policy is shown to have a simple and easily-implemented form. Procedures for computing such a policy are presented, as are procedures for computing the expected utility that it earns, given any starting state. For the case of linear utility, constraints that link the bandits are introduced, and the constrained optimization problem is solved via column generation. The methodology is novel in several respects, which include the use of elementary row operations to simplify arguments.

引用

页码：37 / 62

页数：25

共 50 条

[31] Multi-armed Bandit Problems with Strategic Arms
Braverman, Mark
Mao, Jieming
Schneider, Jon
Weinberg, S. Matthew
CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
[32] Noise Free Multi-armed Bandit Game
Nakamura, Atsuyoshi
Helmbold, David P.
Warmuth, Manfred K.
LANGUAGE AND AUTOMATA THEORY AND APPLICATIONS, LATA 2016, 2016, 9618 : 412 - 423
[33] Robust control of the multi-armed bandit problem
Felipe Caro
Aparupa Das Gupta
Annals of Operations Research, 2022, 317 : 461 - 480
[34] CHARACTERIZING TRUTHFUL MULTI-ARMED BANDIT MECHANISMS
Babaioff, Moshe
Sharma, Yogeshwer
Slivkins, Aleksandrs
SIAM JOURNAL ON COMPUTING, 2014, 43 (01) : 194 - 230
[35] Multi-Armed Recommender System Bandit Ensembles
Canamares, Rocio
Redondo, Marcos
Castells, Pablo
RECSYS 2019: 13TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, 2019, : 432 - 436
[36] Multi-armed bandit problem with known trend
Bouneffouf, Djallel
Feraud, Raphael
NEUROCOMPUTING, 2016, 205 : 16 - 21
[37] A Multi-Armed Bandit Hyper-Heuristic
Ferreira, Alexandre Silvestre
Goncalves, Richard Aderbal
Ramirez Pozo, Aurora Trinidad
2015 BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2015), 2015, : 13 - 18
[38] Ambiguity aversion in multi-armed bandit problems
Christopher M. Anderson
Theory and Decision, 2012, 72 : 15 - 33
[39] Variational inference for the multi-armed contextual bandit
Urteaga, Inigo
Wiggins, Chris H.
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
[40] An Incentive-Compatible Multi-Armed Bandit Mechanism
Gonen, Rica
Pavlov, Elan
PODC'07: PROCEEDINGS OF THE 26TH ANNUAL ACM SYMPOSIUM ON PRINCIPLES OF DISTRIBUTED COMPUTING, 2007, : 362 - 363

← 1 2 3 4 5 →