Collaborative Multi-agent Stochastic Linear Bandits

被引：0

作者：

Moradipari, Ahmadreza ^{[1
]}

Ghavamzadeh, Mohammad ^{[2
]}

Alizadeh, Mahnoosh ^{[1
]}

机构：

[1] Univ Calif Santa Barbara, Santa Barbara, CA 93106 USA

[2] Google Res, Mountain View, CA USA

来源：

2022 AMERICAN CONTROL CONFERENCE, ACC | 2022年

基金：

美国国家科学基金会;

关键词：

FINITE-TIME ANALYSIS;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We study a collaborative multi-agent stochastic linear bandit setting, where N agents that form a network communicate locally to minimize their overall regret. In this setting, each agent has its own linear bandit problem (its own reward parameter) and the goal is to select the best global action w.r.t. the average of their reward parameters. At each round, each agent proposes an action, and one action is randomly selected and played as the network action. All the agents observe the corresponding rewards of the played action, and use an accelerated consensus procedure to compute an estimate of the average of the rewards obtained by all the agents. We propose a distributed upper confidence bound (UCB) algorithm and prove a high probability bound on its T-round regret in which we include a linear growth of regret associated with each / communication round. Our regret bound is of order O(root T/Nlog(1/vertical bar lambda(2)vertical bar) .(logT)(2)), where lambda(2 )is the second largest (in absolute value) eigenvalue of the communication matrix.

引用

页码：2761 / 2766

页数：6

共 50 条

[1] Multi-agent Heterogeneous Stochastic Linear Bandits
Ghosh, Avishek
Sankararaman, Abishek
Ramchandran, Kannan
[J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT IV, 2023, 13716 : 300 - 316
[2] Collaborative Multi-Agent Heterogeneous Multi-Armed Bandits
Chawla, Ronshee
Vial, Daniel
Shakkottai, Sanjay
Srikant, R.
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
[3] Decentralized Multi-Agent Linear Bandits with Safety Constraints
Amani, Sanae
Thrampoulidis, Christos
[J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 6627 - 6635
[4] Multi-Agent Learning with Heterogeneous Linear Contextual Bandits
Anh Do
Thanh Nguyen-Tang
Arora, Raman
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[5] MULTI-ARMED BANDITS IN MULTI-AGENT NETWORKS
Shahrampour, Shahin
Rakhlin, Alexander
Jadbabaie, Ali
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2786 - 2790
[6] Cooperative Multi-Agent Bandits with Heavy Tails
Dubey, Abhimanyu
Pentland, Alex
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[7] Cooperative Multi-Agent Bandits with Heavy Tails
Dubey, Abhimanyu
Pentland, Alex
[J]. 25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
[8] Collaborative Multi-agent System for Automatic Linear Text Segmentation
Perotto, Filipo Studzinski
[J]. PRIMA 2022: PRINCIPLES AND PRACTICE OF MULTI-AGENT SYSTEMS, 2023, 13753 : 573 - 581
[9] Fair Algorithms for Multi-Agent Multi-Armed Bandits
Hossain, Safwan
Micha, Evi
Shah, Nisarg
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[10] Multi-Agent Multi-Armed Bandits with Limited Communication
Agarwal, Mridul
Aggarwal, Vaneet
Azizzadenesheli, Kamyar
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23 : 1 - 24

← 1 2 3 4 5 →