A Dynamic Observation Strategy for Multi-agent Multi-armed Bandit Problem

被引:13
|
作者
Madhushani, Udari [1 ]
Leonard, Naomi Ehrich [1 ]
机构
[1] Princeton Univ, Dept Mech & Aerosp Engn, Princeton, NJ 08544 USA
关键词
ALLOCATION;
D O I
10.23919/ecc51009.2020.9143736
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We define and analyze a multi-agent multi-armed bandit problem in which decision-making agents can observe the choices and rewards of their neighbors under a linear observation cost. Neighbors are defined by a network graph that encodes the inherent observation constraints of the system. We define a cost associated with observations such that at every instance an agent makes an observation it receives a constant observation regret. We design a sampling algorithm and an observation protocol for each agent to maximize its own expected cumulative reward through minimizing expected cumulative sampling regret and expected cumulative observation regret. For our proposed protocol, we prove that total cumulative regret is logarithmically bounded. We verify the accuracy of analytical bounds using numerical simulations.
引用
收藏
页码:1677 / 1682
页数:6
相关论文
共 50 条
  • [11] A Multi-Armed Bandit Strategy for Countermeasure Selection
    Cochrane, Madeleine
    Hunjet, Robert
    2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2020, : 2510 - 2515
  • [12] ON MULTI-ARMED BANDIT PROBLEM WITH NUISANCE PARAMETER
    孙嘉阳
    Science China Mathematics, 1986, (05) : 464 - 475
  • [13] Robust control of the multi-armed bandit problem
    Caro, Felipe
    Das Gupta, Aparupa
    ANNALS OF OPERATIONS RESEARCH, 2022, 317 (02) : 461 - 480
  • [14] An Adaptive Algorithm in Multi-Armed Bandit Problem
    Zhang X.
    Zhou Q.
    Liang B.
    Xu J.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2019, 56 (03): : 643 - 654
  • [15] Robust control of the multi-armed bandit problem
    Felipe Caro
    Aparupa Das Gupta
    Annals of Operations Research, 2022, 317 : 461 - 480
  • [16] Multi-armed bandit problem with known trend
    Bouneffouf, Djallel
    Feraud, Raphael
    NEUROCOMPUTING, 2016, 205 : 16 - 21
  • [17] DBA: Dynamic Multi-Armed Bandit Algorithm
    Nobari, Sadegh
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9869 - 9870
  • [18] MULTI-ARMED BANDITS IN MULTI-AGENT NETWORKS
    Shahrampour, Shahin
    Rakhlin, Alexander
    Jadbabaie, Ali
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2786 - 2790
  • [19] Dynamic Multi-Armed Bandit Algorithm for the Cyclic Bandwidth Sum Problem
    Rodriguez-Tello, Eduardo
    Narvaez-Teran, Valentina
    Lardeux, Frederic
    IEEE ACCESS, 2019, 7 : 40258 - 40270
  • [20] Multi-Agent Multi-Armed Bandit Learning for Online Management of Edge-Assisted Computing
    Wu, Bochun
    Chen, Tianyi
    Ni, Wei
    Wang, Xin
    IEEE TRANSACTIONS ON COMMUNICATIONS, 2021, 69 (12) : 8188 - 8199