Kernel Methods for Cooperative Multi-Agent Contextual Bandits

被引:0
|
作者
Dubey, Abhimanyu [1 ,2 ]
Pentland, Alex [1 ,2 ]
机构
[1] MIT, Media Lab, Cambridge, MA 02139 USA
[2] MIT, Inst Data Syst & Soc, Cambridge, MA 02139 USA
关键词
MULTIARMED BANDIT;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cooperative multi-agent decision making involves a group of agents cooperatively solving learning problems while communicating over a network with delays. In this paper, we consider the kernelised contextual bandit problem, where the reward obtained by an agent is an arbitrary linear function of the contexts' images in the related reproducing kernel Hilbert space (RKHS), and a group of agents must cooperate to collectively solve their unique decision problems. For this problem, we propose COOP-KERNELUCB, an algorithm that provides near-optimal bounds on the per-agent regret, and is both computationally and communicatively efficient. For special cases of the cooperative problem, we also provide variants of COOP-KERNELUCB that provides optimal peragent regret. In addition, our algorithm generalizes several existing results in the multi-agent bandit setting. Finally, on a series of both synthetic and real-world multi-agent network benchmarks, we demonstrate that our algorithm significantly outperforms existing benchmarks.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Kernel Methods for Cooperative Multi-Agent Contextual Bandits
    Dubey, Abhimanyu
    Pentland, Alex
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [2] Cooperative Multi-Agent Bandits with Heavy Tails
    Dubey, Abhimanyu
    Pentland, Alex
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [3] Cooperative Multi-Agent Bandits with Heavy Tails
    Dubey, Abhimanyu
    Pentland, Alex
    25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [4] Multi-Agent Learning with Heterogeneous Linear Contextual Bandits
    Anh Do
    Thanh Nguyen-Tang
    Arora, Raman
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] Budget Allocation as a Multi-Agent System of Contextual & Continuous Bandits
    Han, Benjamin
    Arndt, Carl
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 2937 - 2945
  • [6] Distributed cooperative decision making in multi-agent multi-armed bandits
    Landgren, Peter
    Srivastava, Vaibhav
    Leonard, Naomi Ehrich
    AUTOMATICA, 2021, 125
  • [7] MULTI-ARMED BANDITS IN MULTI-AGENT NETWORKS
    Shahrampour, Shahin
    Rakhlin, Alexander
    Jadbabaie, Ali
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2786 - 2790
  • [8] Collaborative Multi-agent Stochastic Linear Bandits
    Moradipari, Ahmadreza
    Ghavamzadeh, Mohammad
    Alizadeh, Mahnoosh
    2022 AMERICAN CONTROL CONFERENCE, ACC, 2022, : 2761 - 2766
  • [9] Multi-agent Heterogeneous Stochastic Linear Bandits
    Ghosh, Avishek
    Sankararaman, Abishek
    Ramchandran, Kannan
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT IV, 2023, 13716 : 300 - 316
  • [10] Exploration for Free: How Does Reward Heterogeneity Improve Regret in Cooperative Multi-agent Bandits?
    Wang, Xuchuang
    Yang, Lin
    Chen, Yu-Zhen Janice
    Liu, Xutong
    Hajiesmaili, Mohammad
    Towsley, Don
    Lui, John C. S.
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 2192 - 2202