The Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms

被引:0
|
作者
Bayati, Mohsen [1 ]
Hamidi, Nima [1 ]
Johari, Ramesh [1 ]
Khosravi, Khashayar [2 ]
机构
[1] Stanford Univ, Stanford, CA USA
[2] Google Res NYC, Mountain View, CA 94043 USA
基金
美国国家科学基金会;
关键词
ALLOCATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the structure of regret-minimizing policies in the many-armed Bayesian multi-armed bandit problem: in particular, with k the number of arms and T the time horizon, we consider the case where >= root T. We first show that subsampling is a critical step for designing optimal policies. In particular, the standard UCB algorithm leads to sub-optimal regret bounds in the many-armed regime. However, a subsampled UCB (SS-UCB), which samples Theta(root T) arms and executes UCB only on that subset, is rate-optimal. Despite theoretically optimal regret, even SS-UCB performs poorly due to excessive exploration of suboptimal arms. In particular, in numerical experiments SS-UCB performs worse than a simple greedy algorithm (and its subsampled version) that pulls the current empirical best arm at every time period. We show that these insights hold even in a contextual setting, using real-world data. These empirical results suggest a novel form of free exploration in the many-armed regime that benefits greedy algorithms. We theoretically study this new source of free exploration and find that it is deeply connected to the distribution of a certain tail event for the prior distribution of arm rewards. This is a fundamentally distinct phenomenon from free exploration as discussed in the recent literature on contextual bandits, where free exploration arises due to variation in contexts. We use this insight to prove that the subsampled greedy algorithm is rate-optimal for Bernoulli bandits when k > root T, and achieves sublinear regret with more general distributions. This is a case where theoretical rate optimality does not tell the whole story: when complemented by the empirical observations of our paper, the power of greedy algorithms becomes quite evident. Taken together, from a practical standpoint, our results suggest that in applications it may be preferable to use a variant of the greedy algorithm in the many-armed regime.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] The Multi-Armed Bandit With Stochastic Plays
    Lesage-Landry, Antoine
    Taylor, Joshua A.
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2018, 63 (07) : 2280 - 2286
  • [32] Satisficing in Multi-Armed Bandit Problems
    Reverdy, Paul
    Srivastava, Vaibhav
    Leonard, Naomi Ehrich
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (08) : 3788 - 3803
  • [33] Application of Multi-Armed Bandit Algorithms for Channel Sensing in Cognitive Radio
    Kato, Tomohiro
    Zaman, Nur Atiqah Farahin Kamarul
    Hasegawa, Mikio
    2012 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS), 2012, : 503 - 506
  • [34] Distributed Competitive Decision Making Using Multi-Armed Bandit Algorithms
    Mahmoud Almasri
    Ali Mansour
    Christophe Moy
    Ammar Assoum
    Denis Le Jeune
    Christophe Osswald
    Wireless Personal Communications, 2021, 118 : 1165 - 1188
  • [35] Optimizing Rescheduling Intervals through Using Multi-Armed Bandit Algorithms
    Lin, Fuhua
    Dewan, M. Ali Akber
    Nguyen, Matthew
    IEEE 2018 INTERNATIONAL CONGRESS ON CYBERMATICS / 2018 IEEE CONFERENCES ON INTERNET OF THINGS, GREEN COMPUTING AND COMMUNICATIONS, CYBER, PHYSICAL AND SOCIAL COMPUTING, SMART DATA, BLOCKCHAIN, COMPUTER AND INFORMATION TECHNOLOGY, 2018, : 746 - 753
  • [36] Personalized clinical trial based on multi-armed bandit algorithms with covariates
    Shao, Yifei
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ALGORITHMS, SOFTWARE ENGINEERING, AND NETWORK SECURITY, ASENS 2024, 2024, : 12 - 17
  • [37] Multi-armed Bandit with Additional Observations
    Yun, Donggyu
    Proutiere, Alexandre
    Ahn, Sumyeong
    Shin, Jinwoo
    Yi, Yung
    PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2018, 2 (01)
  • [38] IMPROVING STRATEGIES FOR THE MULTI-ARMED BANDIT
    POHLENZ, S
    MARKOV PROCESS AND CONTROL THEORY, 1989, 54 : 158 - 163
  • [39] MULTI-ARMED BANDIT ALLOCATION INDEXES
    JONES, PW
    JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 1989, 40 (12) : 1158 - 1159
  • [40] CONTEXTUAL MULTI-ARMED BANDIT ALGORITHMS FOR PERSONALIZED LEARNING ACTION SELECTION
    Manickam, Indu
    Lan, Andrew S.
    Baraniuk, Richard G.
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 6344 - 6348