Successive Reduction of Arms in Multi-Armed Bandits

被引:1
|
作者
Gupta, Neha [1 ]
Granmo, Ole-Christoffer [2 ]
Agrawala, Ashok [1 ]
机构
[1] Univ Maryland, College Pk, MD 20742 USA
[2] Univ Agder, Kristiansand, Norway
关键词
D O I
10.1007/978-1-4471-2318-7_13
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The relevance of the multi-armed bandit problem has risen in the past few years with the need for online optimization techniques in Internet systems, such as online advertisement and news article recommendation. At the same time, these applications reveal that state-of-the-art solution schemes do not scale well with the number of bandit arms. In this paper, we present two types of Successive Reduction (SR) strategies - 1) Successive Reduction Hoeffding (SRH) and 2) Successive Reduction Order Statistics (SRO). Both use an Order Statistics based Thompson Sampling method for arm selection, and then successively eliminate bandit arms from consideration based on a confidence threshold. While SRH uses Hoeffding Bounds for elimination, SRO uses the probability of an arm being superior to the currently selected arm to measure confidence. A computationally efficient scheme for pair-wise calculation of the latter probability is also presented in this paper. Using SR strategies, sampling resources and arm pulls are not wasted on arms that are unlikely to be the optimal one. To demonstrate the scalability of our proposed schemes, we compare them with two state-of-the-art approaches, namely pure Thompson Sampling and UCB-Tuned. The empirical results are truly conclusive, with the performance advantage of proposed SRO scheme increasing persistently with the number of bandit arms while the SRH scheme shows similar performance as pure Thompson Sampling. We thus believe that SR algorithms will open up for improved performance in Internet based on-line optimization, and tackling of larger problems.
引用
收藏
页码:181 / +
页数:2
相关论文
共 50 条
  • [21] MULTI-ARMED BANDITS AND THE GITTINS INDEX
    WHITTLE, P
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1980, 42 (02): : 143 - 149
  • [22] Multi-armed bandits with switching penalties
    Asawa, M
    Teneketzis, D
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1996, 41 (03) : 328 - 348
  • [23] On Optimal Foraging and Multi-armed Bandits
    Srivastava, Vaibhav
    Reverdy, Paul
    Leonard, Naomi E.
    2013 51ST ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2013, : 494 - 499
  • [24] Active Learning in Multi-armed Bandits
    Antos, Andras
    Grover, Varun
    Szepesvari, Csaba
    ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2008, 5254 : 287 - +
  • [25] Multi-Armed Bandits with Cost Subsidy
    Sinha, Deeksha
    Sankararama, Karthik Abinav
    Kazerouni, Abbas
    Avadhanula, Vashist
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [26] Batched Multi-armed Bandits Problem
    Gao, Zijun
    Han, Yanjun
    Ren, Zhimei
    Zhou, Zhengqing
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [27] Are Multi-Armed Bandits Susceptible to Peeking?
    Loecher, Markus
    ZAGREB INTERNATIONAL REVIEW OF ECONOMICS & BUSINESS, 2018, 21 (01): : 95 - 104
  • [28] Secure Outsourcing of Multi-Armed Bandits
    Ciucanu, Radu
    Lafourcade, Pascal
    Lombard-Platet, Marius
    Soare, Marta
    2020 IEEE 19TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2020), 2020, : 202 - 209
  • [29] Decentralized Exploration in Multi-Armed Bandits
    Feraud, Raphael
    Alami, Reda
    Laroche, Romain
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [30] Multi-armed bandits with episode context
    Rosin, Christopher D.
    ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE, 2011, 61 (03) : 203 - 230