The multi-armed bandit, with constraints

被引：0

作者：

Eric V. Denardo

Eugene A. Feinberg

Uriel G. Rothblum

机构：

[1] Yale University,Center for Systems Sciences

[2] Stony Brook University,Department of Applied Mathematics and Statistics

[3] Technion—Israel Institute of Technology,Late of the Faculty of Industrial Engineering and Management

来源：

Annals of Operations Research | 2013年 / 208卷

关键词：

Optimal Policy; Column Generation; Priority Rule; Initial Randomization; Bandit Problem;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Presented in this paper is a self-contained analysis of a Markov decision problem that is known as the multi-armed bandit. The analysis covers the cases of linear and exponential utility functions. The optimal policy is shown to have a simple and easily-implemented form. Procedures for computing such a policy are presented, as are procedures for computing the expected utility that it earns, given any starting state. For the case of linear utility, constraints that link the bandits are introduced, and the constrained optimization problem is solved via column generation. The methodology is novel in several respects, which include the use of elementary row operations to simplify arguments.

引用

页码：37 / 62

页数：25

共 50 条

[41] Achieving Fairness in the Stochastic Multi-Armed Bandit Problem
Patil, Vishakha
Ghalme, Ganesh
Nair, Vineet
Narahari, Y.
JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22
[42] Gaussian multi-armed bandit problems with multiple objectives
Reverdy, Paul
2016 AMERICAN CONTROL CONFERENCE (ACC), 2016, : 5263 - 5269
[43] Decentralized Multi-Armed Bandit with Multiple Distributed Players
Liu, Keqin
Zhao, Qing
2010 INFORMATION THEORY AND APPLICATIONS WORKSHOP (ITA), 2010, : 568 - 577
[44] Adaptive Active Learning as a Multi-armed Bandit Problem
Czarnecki, Wojciech M.
Podolak, Igor T.
21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 989 - 990
[45] Multi-armed Bandit Algorithm against Strategic Replication
Shin, Suho
Lee, Seungjoon
Ok, Jungseul
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151 : 403 - 431
[46] A Contextual Multi-Armed Bandit approach for NDN forwarding
Mordjana Y.
Djamaa B.
Senouci M.R.
Herzallah A.
Journal of Network and Computer Applications, 2024, 230
[47] Multi-armed bandit experiments in the online service economy
Scott, Steven L.
APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY, 2015, 31 (01) : 37 - 45
[48] FEDERATED MULTI-ARMED BANDIT VIA UNCOORDINATED EXPLORATION
Yan, Zirui
Xiao, Quan
Chen, Tianyi
Tajer, Ali
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 5248 - 5252
[49] Multi-armed bandit with sub-exponential rewards
Jia, Huiwen
Shi, Cong
Shen, Siqian
OPERATIONS RESEARCH LETTERS, 2021, 49 (05) : 728 - 733
[50] Multi-armed Bandit Requiring Monotone Arm Sequences
Chen, Ningyuan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34

← 1 2 3 4 5 →