Optimal Algorithms for Multiplayer Multi-Armed Bandits

被引：0

作者：

Wang, Po-An ^{[1
]}

Proutiere, Alexandre ^{[1
]}

Ariu, Kaito ^{[1
]}

Jedra, Yassir ^{[1
]}

Russo, Alessio ^{[1
]}

机构：

[1] Royal Inst Technol, KTH, Stockholm, Sweden

来源：

INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108 | 2020年 / 108卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The paper addresses various Multiplayer Multi-Armed Bandit (MMAB) problems, where M decision-makers, or players, collaborate to maximize their cumulative reward. We first investigate the MMAB problem where players selecting the same arms experience a collision (and are aware of it) and do not collect any reward. For this problem, we present DPE1 (Decentralized Parsimonious Exploration), a decentralized algorithm that achieves the same asymptotic regret as that obtained by an optimal centralized algorithm. DPE1 is simpler than the state-of-the-art algorithm SIC-MMAB Boursier and Pen-het (2019), and yet offers better performance guarantees. We then study the MMAB problem without collision, where players may select the same arm. Players sit on vertices of a graph, and in each round, they are able to send a message to their neighbours in the graph. We present DPE2, a simple and asymptotically optimal algorithm that outperforms the state-of-the-art algorithm DD-UCB Martinez-Rubio et al. (2019). Besides, under DPE2, the expected number of bits transmitted by the players in the graph is finite.

引用

页数：9

共 50 条

[1] Optimal Streaming Algorithms for Multi-Armed Bandits
Jin, Tianyuan
Huang, Keke
Tang, Jing
Xiao, Xiaokui
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[2] Multiplayer Modeling via Multi-Armed Bandits
Gray, Robert C.
Zhu, Jichen
Ontanon, Santiago
[J]. 2021 IEEE CONFERENCE ON GAMES (COG), 2021, : 695 - 702
[3] Generic Asymptotically Optimal Algorithms for Multi-Armed Bandits
Combes, Richard
Magureanu, Stefan
Proutiere, Alexandre
[J]. 2018 56TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2018, : 152 - 152
[4] Anytime optimal algorithms in stochastic multi-armed bandits
Degenne, Remy
Perchet, Vianney
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[5] Optimal Algorithms for Range Searching over Multi-Armed Bandits
Barman, Siddharth
Krishnamurthy, Ramakrishnan
Rahul, Saladi
[J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2177 - 2183
[6] On Optimal Foraging and Multi-armed Bandits
Srivastava, Vaibhav
Reverdy, Paul
Leonard, Naomi E.
[J]. 2013 51ST ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2013, : 494 - 499
[7] Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards
Lee, Kyungjae
Yang, Hongjun
Lim, Sungbin
Oh, Songhwai
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[8] Quantum greedy algorithms for multi-armed bandits
Hiroshi Ohno
[J]. Quantum Information Processing, 22
[9] Quantum Exploration Algorithms for Multi-Armed Bandits
Wang, Daochen
You, Xuchen
Li, Tongyang
Childs, Andrew M.
[J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 10102 - 10110
[10] Algorithms for Differentially Private Multi-Armed Bandits
Tossou, Aristide C. Y.
Dimitrakakis, Christos
[J]. THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2087 - 2093

← 1 2 3 4 5 →