The Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms

被引:0
|
作者
Bayati, Mohsen [1 ]
Hamidi, Nima [1 ]
Johari, Ramesh [1 ]
Khosravi, Khashayar [2 ]
机构
[1] Stanford Univ, Stanford, CA USA
[2] Google Res NYC, Mountain View, CA 94043 USA
基金
美国国家科学基金会;
关键词
ALLOCATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the structure of regret-minimizing policies in the many-armed Bayesian multi-armed bandit problem: in particular, with k the number of arms and T the time horizon, we consider the case where >= root T. We first show that subsampling is a critical step for designing optimal policies. In particular, the standard UCB algorithm leads to sub-optimal regret bounds in the many-armed regime. However, a subsampled UCB (SS-UCB), which samples Theta(root T) arms and executes UCB only on that subset, is rate-optimal. Despite theoretically optimal regret, even SS-UCB performs poorly due to excessive exploration of suboptimal arms. In particular, in numerical experiments SS-UCB performs worse than a simple greedy algorithm (and its subsampled version) that pulls the current empirical best arm at every time period. We show that these insights hold even in a contextual setting, using real-world data. These empirical results suggest a novel form of free exploration in the many-armed regime that benefits greedy algorithms. We theoretically study this new source of free exploration and find that it is deeply connected to the distribution of a certain tail event for the prior distribution of arm rewards. This is a fundamentally distinct phenomenon from free exploration as discussed in the recent literature on contextual bandits, where free exploration arises due to variation in contexts. We use this insight to prove that the subsampled greedy algorithm is rate-optimal for Bernoulli bandits when k > root T, and achieves sublinear regret with more general distributions. This is a case where theoretical rate optimality does not tell the whole story: when complemented by the empirical observations of our paper, the power of greedy algorithms becomes quite evident. Taken together, from a practical standpoint, our results suggest that in applications it may be preferable to use a variant of the greedy algorithm in the many-armed regime.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Reconfigurable and Computationally Efficient Architecture for Multi-armed Bandit Algorithms
    Santosh, S. V. Sai
    Darak, S. J.
    2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
  • [22] Strict greedy design paradigm applied to the stochastic multi-armed bandit problem
    Joey Hong
    机床与液压, 2015, 43 (06) : 1 - 6
  • [23] Correlated Gaussian Multi-Objective Multi-Armed Bandit across Arms Algorithm
    Yahyaa, Saba Q.
    Drugan, Madalina M.
    2015 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2015, : 593 - 600
  • [24] Online Interactive Collaborative Filtering Using Multi-Armed Bandit with Dependent Arms
    Wang, Qing
    Zeng, Chunqiu
    Zhou, Wubai
    Li, Tao
    Lyengar, S. S.
    Shwartz, Larisa
    Grabarnik, Genady Ya
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2019, 31 (08) : 1569 - 1580
  • [25] Analysis of Thompson Sampling for Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms
    Huyuk, Alihan
    Tekin, Cem
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [26] Auction-Based Combinatorial Multi-Armed Bandit Mechanisms with Strategic Arms
    Gao, Guoju
    Huang, He
    Xiao, Mingjun
    Wu, Jie
    Sun, Yu-E
    Zhang, Sheng
    IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2021), 2021,
  • [27] AB Testing for Process Versions with Contextual Multi-armed Bandit Algorithms
    Satyal, Suhrid
    Weber, Ingo
    Paik, Hye-Young
    Di Ciccio, Claudio
    Mendling, Jan
    ADVANCED INFORMATION SYSTEMS ENGINEERING, CAISE 2018, 2018, 10816 : 19 - 34
  • [28] Dynamic Multi-Armed Bandit with Covariates
    Pavlidis, Nicos G.
    Tasoulis, Dimitris K.
    Adams, Niall M.
    Hand, David J.
    ECAI 2008, PROCEEDINGS, 2008, 178 : 777 - +
  • [29] Distributed Competitive Decision Making Using Multi-Armed Bandit Algorithms
    Almasri, Mahmoud
    Mansour, Ali
    Moy, Christophe
    Assoum, Ammar
    Le Jeune, Denis
    Osswald, Christophe
    WIRELESS PERSONAL COMMUNICATIONS, 2021, 118 (02) : 1165 - 1188
  • [30] The budgeted multi-armed bandit problem
    Madani, O
    Lizotte, DJ
    Greiner, R
    LEARNING THEORY, PROCEEDINGS, 2004, 3120 : 643 - 645