Optimal Algorithms for Multiplayer Multi-Armed Bandits

被引:0
|
作者
Wang, Po-An [1 ]
Proutiere, Alexandre [1 ]
Ariu, Kaito [1 ]
Jedra, Yassir [1 ]
Russo, Alessio [1 ]
机构
[1] Royal Inst Technol, KTH, Stockholm, Sweden
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The paper addresses various Multiplayer Multi-Armed Bandit (MMAB) problems, where M decision-makers, or players, collaborate to maximize their cumulative reward. We first investigate the MMAB problem where players selecting the same arms experience a collision (and are aware of it) and do not collect any reward. For this problem, we present DPE1 (Decentralized Parsimonious Exploration), a decentralized algorithm that achieves the same asymptotic regret as that obtained by an optimal centralized algorithm. DPE1 is simpler than the state-of-the-art algorithm SIC-MMAB Boursier and Pen-het (2019), and yet offers better performance guarantees. We then study the MMAB problem without collision, where players may select the same arm. Players sit on vertices of a graph, and in each round, they are able to send a message to their neighbours in the graph. We present DPE2, a simple and asymptotically optimal algorithm that outperforms the state-of-the-art algorithm DD-UCB Martinez-Rubio et al. (2019). Besides, under DPE2, the expected number of bits transmitted by the players in the graph is finite.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Optimal Streaming Algorithms for Multi-Armed Bandits
    Jin, Tianyuan
    Huang, Keke
    Tang, Jing
    Xiao, Xiaokui
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [2] Multiplayer Modeling via Multi-Armed Bandits
    Gray, Robert C.
    Zhu, Jichen
    Ontanon, Santiago
    [J]. 2021 IEEE CONFERENCE ON GAMES (COG), 2021, : 695 - 702
  • [3] Generic Asymptotically Optimal Algorithms for Multi-Armed Bandits
    Combes, Richard
    Magureanu, Stefan
    Proutiere, Alexandre
    [J]. 2018 56TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2018, : 152 - 152
  • [4] Anytime optimal algorithms in stochastic multi-armed bandits
    Degenne, Remy
    Perchet, Vianney
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [5] Optimal Algorithms for Range Searching over Multi-Armed Bandits
    Barman, Siddharth
    Krishnamurthy, Ramakrishnan
    Rahul, Saladi
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2177 - 2183
  • [6] On Optimal Foraging and Multi-armed Bandits
    Srivastava, Vaibhav
    Reverdy, Paul
    Leonard, Naomi E.
    [J]. 2013 51ST ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2013, : 494 - 499
  • [7] Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards
    Lee, Kyungjae
    Yang, Hongjun
    Lim, Sungbin
    Oh, Songhwai
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [8] Quantum greedy algorithms for multi-armed bandits
    Hiroshi Ohno
    [J]. Quantum Information Processing, 22
  • [9] Quantum Exploration Algorithms for Multi-Armed Bandits
    Wang, Daochen
    You, Xuchen
    Li, Tongyang
    Childs, Andrew M.
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 10102 - 10110
  • [10] Algorithms for Differentially Private Multi-Armed Bandits
    Tossou, Aristide C. Y.
    Dimitrakakis, Christos
    [J]. THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2087 - 2093