Collaborative Multi-agent Stochastic Linear Bandits

被引:0
|
作者
Moradipari, Ahmadreza [1 ]
Ghavamzadeh, Mohammad [2 ]
Alizadeh, Mahnoosh [1 ]
机构
[1] Univ Calif Santa Barbara, Santa Barbara, CA 93106 USA
[2] Google Res, Mountain View, CA USA
基金
美国国家科学基金会;
关键词
FINITE-TIME ANALYSIS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study a collaborative multi-agent stochastic linear bandit setting, where N agents that form a network communicate locally to minimize their overall regret. In this setting, each agent has its own linear bandit problem (its own reward parameter) and the goal is to select the best global action w.r.t. the average of their reward parameters. At each round, each agent proposes an action, and one action is randomly selected and played as the network action. All the agents observe the corresponding rewards of the played action, and use an accelerated consensus procedure to compute an estimate of the average of the rewards obtained by all the agents. We propose a distributed upper confidence bound (UCB) algorithm and prove a high probability bound on its T-round regret in which we include a linear growth of regret associated with each / communication round. Our regret bound is of order O(root T/Nlog(1/vertical bar lambda(2)vertical bar) .(logT)(2)), where lambda(2 )is the second largest (in absolute value) eigenvalue of the communication matrix.
引用
收藏
页码:2761 / 2766
页数:6
相关论文
共 50 条
  • [1] Multi-agent Heterogeneous Stochastic Linear Bandits
    Ghosh, Avishek
    Sankararaman, Abishek
    Ramchandran, Kannan
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT IV, 2023, 13716 : 300 - 316
  • [2] Collaborative Multi-Agent Heterogeneous Multi-Armed Bandits
    Chawla, Ronshee
    Vial, Daniel
    Shakkottai, Sanjay
    Srikant, R.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [3] Decentralized Multi-Agent Linear Bandits with Safety Constraints
    Amani, Sanae
    Thrampoulidis, Christos
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 6627 - 6635
  • [4] Multi-Agent Learning with Heterogeneous Linear Contextual Bandits
    Anh Do
    Thanh Nguyen-Tang
    Arora, Raman
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] MULTI-ARMED BANDITS IN MULTI-AGENT NETWORKS
    Shahrampour, Shahin
    Rakhlin, Alexander
    Jadbabaie, Ali
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2786 - 2790
  • [6] Cooperative Multi-Agent Bandits with Heavy Tails
    Dubey, Abhimanyu
    Pentland, Alex
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [7] Cooperative Multi-Agent Bandits with Heavy Tails
    Dubey, Abhimanyu
    Pentland, Alex
    [J]. 25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [8] Collaborative Multi-agent System for Automatic Linear Text Segmentation
    Perotto, Filipo Studzinski
    [J]. PRIMA 2022: PRINCIPLES AND PRACTICE OF MULTI-AGENT SYSTEMS, 2023, 13753 : 573 - 581
  • [9] Fair Algorithms for Multi-Agent Multi-Armed Bandits
    Hossain, Safwan
    Micha, Evi
    Shah, Nisarg
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [10] Multi-Agent Multi-Armed Bandits with Limited Communication
    Agarwal, Mridul
    Aggarwal, Vaneet
    Azizzadenesheli, Kamyar
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23 : 1 - 24