The multi-armed bandit, with constraints

被引:0
|
作者
Eric V. Denardo
Eugene A. Feinberg
Uriel G. Rothblum
机构
[1] Yale University,Center for Systems Sciences
[2] Stony Brook University,Department of Applied Mathematics and Statistics
[3] Technion—Israel Institute of Technology,Late of the Faculty of Industrial Engineering and Management
来源
关键词
Optimal Policy; Column Generation; Priority Rule; Initial Randomization; Bandit Problem;
D O I
暂无
中图分类号
学科分类号
摘要
Presented in this paper is a self-contained analysis of a Markov decision problem that is known as the multi-armed bandit. The analysis covers the cases of linear and exponential utility functions. The optimal policy is shown to have a simple and easily-implemented form. Procedures for computing such a policy are presented, as are procedures for computing the expected utility that it earns, given any starting state. For the case of linear utility, constraints that link the bandits are introduced, and the constrained optimization problem is solved via column generation. The methodology is novel in several respects, which include the use of elementary row operations to simplify arguments.
引用
收藏
页码:37 / 62
页数:25
相关论文
共 50 条
  • [21] Robust control of the multi-armed bandit problem
    Caro, Felipe
    Das Gupta, Aparupa
    ANNALS OF OPERATIONS RESEARCH, 2022, 317 (02) : 461 - 480
  • [22] Achieving Privacy in the Adversarial Multi-Armed Bandit
    Tossou, Aristide C. Y.
    Dimitrakakis, Christos
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2653 - 2659
  • [23] Generic Outlier Detection in Multi-Armed Bandit
    Ban, Yikun
    He, Jingrui
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 913 - 923
  • [24] Anytime Algorithms for Multi-Armed Bandit Problems
    Kleinberg, Robert
    PROCEEDINGS OF THE SEVENTHEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2006, : 928 - 936
  • [25] An Adaptive Algorithm in Multi-Armed Bandit Problem
    Zhang X.
    Zhou Q.
    Liang B.
    Xu J.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2019, 56 (03): : 643 - 654
  • [26] A Multi-Armed Bandit Strategy for Countermeasure Selection
    Cochrane, Madeleine
    Hunjet, Robert
    2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2020, : 2510 - 2515
  • [27] DBA: Dynamic Multi-Armed Bandit Algorithm
    Nobari, Sadegh
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9869 - 9870
  • [28] Percentile optimization in multi-armed bandit problems
    Ghatrani, Zahra
    Ghate, Archis
    ANNALS OF OPERATIONS RESEARCH, 2024, 340 (2-3) : 837 - 862
  • [29] Multi-armed Bandit Mechanism with Private Histories
    Liu, Chang
    Cai, Qingpeng
    Zhang, Yukui
    AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 1607 - 1609
  • [30] Ambiguity aversion in multi-armed bandit problems
    Anderson, Christopher M.
    THEORY AND DECISION, 2012, 72 (01) : 15 - 33