The Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms

被引:0
|
作者
Bayati, Mohsen [1 ]
Hamidi, Nima [1 ]
Johari, Ramesh [1 ]
Khosravi, Khashayar [2 ]
机构
[1] Stanford Univ, Stanford, CA USA
[2] Google Res NYC, Mountain View, CA 94043 USA
基金
美国国家科学基金会;
关键词
ALLOCATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the structure of regret-minimizing policies in the many-armed Bayesian multi-armed bandit problem: in particular, with k the number of arms and T the time horizon, we consider the case where >= root T. We first show that subsampling is a critical step for designing optimal policies. In particular, the standard UCB algorithm leads to sub-optimal regret bounds in the many-armed regime. However, a subsampled UCB (SS-UCB), which samples Theta(root T) arms and executes UCB only on that subset, is rate-optimal. Despite theoretically optimal regret, even SS-UCB performs poorly due to excessive exploration of suboptimal arms. In particular, in numerical experiments SS-UCB performs worse than a simple greedy algorithm (and its subsampled version) that pulls the current empirical best arm at every time period. We show that these insights hold even in a contextual setting, using real-world data. These empirical results suggest a novel form of free exploration in the many-armed regime that benefits greedy algorithms. We theoretically study this new source of free exploration and find that it is deeply connected to the distribution of a certain tail event for the prior distribution of arm rewards. This is a fundamentally distinct phenomenon from free exploration as discussed in the recent literature on contextual bandits, where free exploration arises due to variation in contexts. We use this insight to prove that the subsampled greedy algorithm is rate-optimal for Bernoulli bandits when k > root T, and achieves sublinear regret with more general distributions. This is a case where theoretical rate optimality does not tell the whole story: when complemented by the empirical observations of our paper, the power of greedy algorithms becomes quite evident. Taken together, from a practical standpoint, our results suggest that in applications it may be preferable to use a variant of the greedy algorithm in the many-armed regime.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Identifying Outlier Arms in Multi-Armed Bandit
    Zhuang, Honglei
    Wang, Chi
    Wang, Yifan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [2] Scaling Multi-Armed Bandit Algorithms
    Fouche, Edouard
    Komiyama, Junpei
    Boehm, Klemens
    KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 1449 - 1459
  • [3] Multi-armed Bandit Problems with Strategic Arms
    Braverman, Mark
    Mao, Jieming
    Schneider, Jon
    Weinberg, S. Matthew
    CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
  • [4] Multi-armed bandit algorithms and empirical evaluation
    Vermorel, J
    Mohri, M
    MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 437 - 448
  • [5] Anytime Algorithms for Multi-Armed Bandit Problems
    Kleinberg, Robert
    PROCEEDINGS OF THE SEVENTHEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2006, : 928 - 936
  • [6] Quantum greedy algorithms for multi-armed bandits
    Hiroshi Ohno
    Quantum Information Processing, 22
  • [7] Quantum greedy algorithms for multi-armed bandits
    Ohno, Hiroshi
    QUANTUM INFORMATION PROCESSING, 2023, 22 (02)
  • [8] MABFuzz: Multi-Armed Bandit Algorithms for Fuzzing Processors
    Gohil, Vasudev
    Kande, Rahul
    Chen, Chen
    Sadeghi, Ahmad-Reza
    Rajendran, Jeyavijayan
    2024 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2024,
  • [9] Fair Link Prediction with Multi-Armed Bandit Algorithms
    Wang, Weixiang
    Soundarajan, Sucheta
    PROCEEDINGS OF THE 15TH ACM WEB SCIENCE CONFERENCE, WEBSCI 2023, 2023, : 219 - 228
  • [10] Multi-armed Bandit Algorithms for Adaptive Learning: A Survey
    Mui, John
    Lin, Fuhua
    Dewan, M. Ali Akber
    ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2021), PT II, 2021, 12749 : 273 - 278