Successive Reduction of Arms in Multi-Armed Bandits

被引:1
|
作者
Gupta, Neha [1 ]
Granmo, Ole-Christoffer [2 ]
Agrawala, Ashok [1 ]
机构
[1] Univ Maryland, College Pk, MD 20742 USA
[2] Univ Agder, Kristiansand, Norway
关键词
D O I
10.1007/978-1-4471-2318-7_13
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The relevance of the multi-armed bandit problem has risen in the past few years with the need for online optimization techniques in Internet systems, such as online advertisement and news article recommendation. At the same time, these applications reveal that state-of-the-art solution schemes do not scale well with the number of bandit arms. In this paper, we present two types of Successive Reduction (SR) strategies - 1) Successive Reduction Hoeffding (SRH) and 2) Successive Reduction Order Statistics (SRO). Both use an Order Statistics based Thompson Sampling method for arm selection, and then successively eliminate bandit arms from consideration based on a confidence threshold. While SRH uses Hoeffding Bounds for elimination, SRO uses the probability of an arm being superior to the currently selected arm to measure confidence. A computationally efficient scheme for pair-wise calculation of the latter probability is also presented in this paper. Using SR strategies, sampling resources and arm pulls are not wasted on arms that are unlikely to be the optimal one. To demonstrate the scalability of our proposed schemes, we compare them with two state-of-the-art approaches, namely pure Thompson Sampling and UCB-Tuned. The empirical results are truly conclusive, with the performance advantage of proposed SRO scheme increasing persistently with the number of bandit arms while the SRH scheme shows similar performance as pure Thompson Sampling. We thus believe that SR algorithms will open up for improved performance in Internet based on-line optimization, and tackling of larger problems.
引用
收藏
页码:181 / +
页数:2
相关论文
共 50 条
  • [31] Introduction to Multi-Armed Bandits Preface
    Slivkins, Aleksandrs
    FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2019, 12 (1-2): : 1 - 286
  • [32] Federated Multi-armed Bandits with Personalization
    Shi, Chengshuai
    Shen, Cong
    Yang, Jing
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [33] Risk Control of Best Arm Identification in Multi-Armed Bandits via Successive Rejects
    Yu, Xiaotian
    King, Irwin
    Lyu, Michael R.
    2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2017, : 1147 - 1152
  • [34] LEVY BANDITS: MULTI-ARMED BANDITS DRIVEN BY LEVY PROCESSES
    Kaspi, Haya
    Mandelbaum, Avi
    ANNALS OF APPLIED PROBABILITY, 1995, 5 (02): : 541 - 565
  • [35] Quantum greedy algorithms for multi-armed bandits
    Hiroshi Ohno
    Quantum Information Processing, 22
  • [36] Online Multi-Armed Bandits with Adaptive Inference
    Dimakopoulou, Maria
    Ren, Zhimei
    Zhou, Zhengyuan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [37] Multi-Armed Bandits for Adaptive Constraint Propagation
    Balafrej, Amine
    Bessiere, Christian
    Paparrizou, Anastasia
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 290 - 296
  • [38] Multi-armed linear bandits with latent biases
    Kang, Qiyu
    Tay, Wee Peng
    She, Rui
    Wang, Sijie
    Liu, Xiaoqian
    Yang, Yuan-Rui
    Information Sciences, 2024, 660
  • [39] Algorithms for Differentially Private Multi-Armed Bandits
    Tossou, Aristide C. Y.
    Dimitrakakis, Christos
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2087 - 2093
  • [40] Combinatorial Multi-armed Bandits for Resource Allocation
    Zuo, Jinhang
    Joe-Wong, Carlee
    2021 55TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2021,