Coordinated Versus Decentralized Exploration In Multi-Agent Multi-Armed Bandits

被引:0
|
作者
Chakraborty, Mithun [1 ]
Chua, Kai Yee Phoebe [2 ]
Das, Sanmay [1 ]
Juba, Brendan [1 ]
机构
[1] Washington Univ, St Louis, MO USA
[2] Univ Calif Irvine, Irvine, CA USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we introduce a multi-agent multi-armed bandit-based model for ad hoc teamwork with expensive communication. The goal of the team is to maximize the total reward gained from pulling arms of a bandit over a number of epochs. In each epoch, each agent decides whether to pull an arm, or to broadcast the reward it obtained in the previous epoch to the team and forgo pulling an arm. These decisions must be made only on the basis of the agent's private information and the public information broadcast prior to that epoch. We first benchmark the achievable utility by analyzing an idealized version of this problem where a central authority has complete knowledge of rewards acquired from all arms in all epochs and uses a multiplicative weights update algorithm for allocating arms to agents. We then introduce an algorithm for the decentralized setting that uses a value-of-information based communication strategy and an exploration-exploitation strategy based on the centralized algorithm, and show experimentally that it converges rapidly to the performance of the centralized method.
引用
收藏
页码:164 / 170
页数:7
相关论文
共 50 条
  • [1] Decentralized Exploration in Multi-Armed Bandits
    Feraud, Raphael
    Alami, Reda
    Laroche, Romain
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [2] MULTI-ARMED BANDITS IN MULTI-AGENT NETWORKS
    Shahrampour, Shahin
    Rakhlin, Alexander
    Jadbabaie, Ali
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2786 - 2790
  • [3] Multi-Agent Multi-Armed Bandits with Limited Communication
    Agarwal, Mridul
    Aggarwal, Vaneet
    Azizzadenesheli, Kamyar
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23 : 1 - 24
  • [4] Fair Algorithms for Multi-Agent Multi-Armed Bandits
    Hossain, Safwan
    Micha, Evi
    Shah, Nisarg
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [5] Collaborative Multi-Agent Heterogeneous Multi-Armed Bandits
    Chawla, Ronshee
    Vial, Daniel
    Shakkottai, Sanjay
    Srikant, R.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [6] Distributed cooperative decision making in multi-agent multi-armed bandits
    Landgren, Peter
    Srivastava, Vaibhav
    Leonard, Naomi Ehrich
    [J]. AUTOMATICA, 2021, 125
  • [7] Multi-Armed Bandits for Spectrum Allocation in Multi-Agent Channel Bonding WLANs
    Barrachina-Munoz, Sergio
    Chiumento, Alessandro
    Bellalta, Boris
    [J]. IEEE ACCESS, 2021, 9 : 133472 - 133490
  • [8] Statistical and Computational Trade-off in Multi-Agent Multi-Armed Bandits
    Vannella, Filippo
    Protiuere, Alexandre
    Jeong, Jaeseong
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] Decentralized Learning for Multi-player Multi-armed Bandits
    Kalathil, Dileep
    Nayyar, Naumaan
    Jain, Rahul
    [J]. 2012 IEEE 51ST ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2012, : 3960 - 3965
  • [10] On Interruptible Pure Exploration in Multi-Armed Bandits
    Shleyfman, Alexander
    Komenda, Antonin
    Domshlak, Carmel
    [J]. PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 3592 - 3598