A Dynamic Observation Strategy for Multi-agent Multi-armed Bandit Problem

被引：13

作者：

Madhushani, Udari ^{[1
]}

Leonard, Naomi Ehrich ^{[1
]}

机构：

[1] Princeton Univ, Dept Mech & Aerosp Engn, Princeton, NJ 08544 USA

来源：

2020 EUROPEAN CONTROL CONFERENCE (ECC 2020) | 2020年

关键词：

ALLOCATION;

D O I：

10.23919/ecc51009.2020.9143736

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We define and analyze a multi-agent multi-armed bandit problem in which decision-making agents can observe the choices and rewards of their neighbors under a linear observation cost. Neighbors are defined by a network graph that encodes the inherent observation constraints of the system. We define a cost associated with observations such that at every instance an agent makes an observation it receives a constant observation regret. We design a sampling algorithm and an observation protocol for each agent to maximize its own expected cumulative reward through minimizing expected cumulative sampling regret and expected cumulative observation regret. For our proposed protocol, we prove that total cumulative regret is logarithmically bounded. We verify the accuracy of analytical bounds using numerical simulations.

引用

页码：1677 / 1682

页数：6

共 50 条

[11] A Multi-Armed Bandit Strategy for Countermeasure Selection
Cochrane, Madeleine
Hunjet, Robert
2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2020, : 2510 - 2515
[12] ON MULTI-ARMED BANDIT PROBLEM WITH NUISANCE PARAMETER
孙嘉阳
Science China Mathematics, 1986, (05) : 464 - 475
[13] Robust control of the multi-armed bandit problem
Caro, Felipe
Das Gupta, Aparupa
ANNALS OF OPERATIONS RESEARCH, 2022, 317 (02) : 461 - 480
[14] An Adaptive Algorithm in Multi-Armed Bandit Problem
Zhang X.
Zhou Q.
Liang B.
Xu J.
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2019, 56 (03): : 643 - 654
[15] Robust control of the multi-armed bandit problem
Felipe Caro
Aparupa Das Gupta
Annals of Operations Research, 2022, 317 : 461 - 480
[16] Multi-armed bandit problem with known trend
Bouneffouf, Djallel
Feraud, Raphael
NEUROCOMPUTING, 2016, 205 : 16 - 21
[17] DBA: Dynamic Multi-Armed Bandit Algorithm
Nobari, Sadegh
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9869 - 9870
[18] MULTI-ARMED BANDITS IN MULTI-AGENT NETWORKS
Shahrampour, Shahin
Rakhlin, Alexander
Jadbabaie, Ali
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2786 - 2790
[19] Dynamic Multi-Armed Bandit Algorithm for the Cyclic Bandwidth Sum Problem
Rodriguez-Tello, Eduardo
Narvaez-Teran, Valentina
Lardeux, Frederic
IEEE ACCESS, 2019, 7 : 40258 - 40270
[20] Multi-Agent Multi-Armed Bandit Learning for Online Management of Edge-Assisted Computing
Wu, Bochun
Chen, Tianyi
Ni, Wei
Wang, Xin
IEEE TRANSACTIONS ON COMMUNICATIONS, 2021, 69 (12) : 8188 - 8199

← 1 2 3 4 5 →