The multi-armed bandit, with constraints

被引:0
|
作者
Eric V. Denardo
Eugene A. Feinberg
Uriel G. Rothblum
机构
[1] Yale University,Center for Systems Sciences
[2] Stony Brook University,Department of Applied Mathematics and Statistics
[3] Technion—Israel Institute of Technology,Late of the Faculty of Industrial Engineering and Management
来源
关键词
Optimal Policy; Column Generation; Priority Rule; Initial Randomization; Bandit Problem;
D O I
暂无
中图分类号
学科分类号
摘要
Presented in this paper is a self-contained analysis of a Markov decision problem that is known as the multi-armed bandit. The analysis covers the cases of linear and exponential utility functions. The optimal policy is shown to have a simple and easily-implemented form. Procedures for computing such a policy are presented, as are procedures for computing the expected utility that it earns, given any starting state. For the case of linear utility, constraints that link the bandits are introduced, and the constrained optimization problem is solved via column generation. The methodology is novel in several respects, which include the use of elementary row operations to simplify arguments.
引用
收藏
页码:37 / 62
页数:25
相关论文
共 50 条
  • [41] Achieving Fairness in the Stochastic Multi-Armed Bandit Problem
    Patil, Vishakha
    Ghalme, Ganesh
    Nair, Vineet
    Narahari, Y.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22
  • [42] Gaussian multi-armed bandit problems with multiple objectives
    Reverdy, Paul
    2016 AMERICAN CONTROL CONFERENCE (ACC), 2016, : 5263 - 5269
  • [43] Decentralized Multi-Armed Bandit with Multiple Distributed Players
    Liu, Keqin
    Zhao, Qing
    2010 INFORMATION THEORY AND APPLICATIONS WORKSHOP (ITA), 2010, : 568 - 577
  • [44] Adaptive Active Learning as a Multi-armed Bandit Problem
    Czarnecki, Wojciech M.
    Podolak, Igor T.
    21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 989 - 990
  • [45] Multi-armed Bandit Algorithm against Strategic Replication
    Shin, Suho
    Lee, Seungjoon
    Ok, Jungseul
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151 : 403 - 431
  • [46] A Contextual Multi-Armed Bandit approach for NDN forwarding
    Mordjana Y.
    Djamaa B.
    Senouci M.R.
    Herzallah A.
    Journal of Network and Computer Applications, 2024, 230
  • [47] Multi-armed bandit experiments in the online service economy
    Scott, Steven L.
    APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY, 2015, 31 (01) : 37 - 45
  • [48] FEDERATED MULTI-ARMED BANDIT VIA UNCOORDINATED EXPLORATION
    Yan, Zirui
    Xiao, Quan
    Chen, Tianyi
    Tajer, Ali
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 5248 - 5252
  • [49] Multi-armed bandit with sub-exponential rewards
    Jia, Huiwen
    Shi, Cong
    Shen, Siqian
    OPERATIONS RESEARCH LETTERS, 2021, 49 (05) : 728 - 733
  • [50] Multi-armed Bandit Requiring Monotone Arm Sequences
    Chen, Ningyuan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34