Successive Reduction of Arms in Multi-Armed Bandits

被引:1
|
作者
Gupta, Neha [1 ]
Granmo, Ole-Christoffer [2 ]
Agrawala, Ashok [1 ]
机构
[1] Univ Maryland, College Pk, MD 20742 USA
[2] Univ Agder, Kristiansand, Norway
关键词
D O I
10.1007/978-1-4471-2318-7_13
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The relevance of the multi-armed bandit problem has risen in the past few years with the need for online optimization techniques in Internet systems, such as online advertisement and news article recommendation. At the same time, these applications reveal that state-of-the-art solution schemes do not scale well with the number of bandit arms. In this paper, we present two types of Successive Reduction (SR) strategies - 1) Successive Reduction Hoeffding (SRH) and 2) Successive Reduction Order Statistics (SRO). Both use an Order Statistics based Thompson Sampling method for arm selection, and then successively eliminate bandit arms from consideration based on a confidence threshold. While SRH uses Hoeffding Bounds for elimination, SRO uses the probability of an arm being superior to the currently selected arm to measure confidence. A computationally efficient scheme for pair-wise calculation of the latter probability is also presented in this paper. Using SR strategies, sampling resources and arm pulls are not wasted on arms that are unlikely to be the optimal one. To demonstrate the scalability of our proposed schemes, we compare them with two state-of-the-art approaches, namely pure Thompson Sampling and UCB-Tuned. The empirical results are truly conclusive, with the performance advantage of proposed SRO scheme increasing persistently with the number of bandit arms while the SRH scheme shows similar performance as pure Thompson Sampling. We thus believe that SR algorithms will open up for improved performance in Internet based on-line optimization, and tackling of larger problems.
引用
收藏
页码:181 / +
页数:2
相关论文
共 50 条
  • [41] TRANSFER LEARNING FOR CONTEXTUAL MULTI-ARMED BANDITS
    Cai, Changxiao
    Cai, T. Tony
    Li, Hongzhe
    ANNALS OF STATISTICS, 2024, 52 (01): : 207 - 232
  • [42] Quantum Reinforcement Learning for Multi-Armed Bandits
    Liu, Yi-Pei
    Li, Kuo
    Cao, Xi
    Jia, Qing-Shan
    Wang, Xu
    2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 5675 - 5680
  • [43] Multi-armed bandits in discrete and continuous time
    Kaspi, H
    Mandelbaum, A
    ANNALS OF APPLIED PROBABILITY, 1998, 8 (04): : 1270 - 1290
  • [44] Multi-armed Bandits with Metric Switching Costs
    Guha, Sudipto
    Munagala, Kamesh
    AUTOMATA, LANGUAGES AND PROGRAMMING, PT II, PROCEEDINGS, 2009, 5556 : 496 - +
  • [45] Multiplayer Modeling via Multi-Armed Bandits
    Gray, Robert C.
    Zhu, Jichen
    Ontanon, Santiago
    2021 IEEE CONFERENCE ON GAMES (COG), 2021, : 695 - 702
  • [46] Survey on Applications of Multi-Armed and Contextual Bandits
    Bouneffouf, Djallel
    Rish, Irina
    Aggarwal, Charu
    2020 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2020,
  • [47] On Interruptible Pure Exploration in Multi-Armed Bandits
    Shleyfman, Alexander
    Komenda, Antonin
    Domshlak, Carmel
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 3592 - 3598
  • [48] Thompson Sampling for Budgeted Multi-armed Bandits
    Xia, Yingce
    Li, Haifang
    Qin, Tao
    Yu, Nenghai
    Liu, Tie-Yan
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 3960 - 3966
  • [49] Quantum Exploration Algorithms for Multi-Armed Bandits
    Wang, Daochen
    You, Xuchen
    Li, Tongyang
    Childs, Andrew M.
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 10102 - 10110
  • [50] Global Multi-armed Bandits with Holder Continuity
    Atan, Onur
    Tekin, Cem
    van der Schaar, Mihaela
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 38, 2015, 38 : 28 - 36