A Dynamic Observation Strategy for Multi-agent Multi-armed Bandit Problem

被引：13

作者：

Madhushani, Udari ^{[1
]}

Leonard, Naomi Ehrich ^{[1
]}

机构：

[1] Princeton Univ, Dept Mech & Aerosp Engn, Princeton, NJ 08544 USA

来源：

2020 EUROPEAN CONTROL CONFERENCE (ECC 2020) | 2020年

关键词：

ALLOCATION;

D O I：

10.23919/ecc51009.2020.9143736

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We define and analyze a multi-agent multi-armed bandit problem in which decision-making agents can observe the choices and rewards of their neighbors under a linear observation cost. Neighbors are defined by a network graph that encodes the inherent observation constraints of the system. We define a cost associated with observations such that at every instance an agent makes an observation it receives a constant observation regret. We design a sampling algorithm and an observation protocol for each agent to maximize its own expected cumulative reward through minimizing expected cumulative sampling regret and expected cumulative observation regret. For our proposed protocol, we prove that total cumulative regret is logarithmically bounded. We verify the accuracy of analytical bounds using numerical simulations.

引用

页码：1677 / 1682

页数：6

共 50 条

[41] The Multi-fidelity Multi-armed Bandit
Kandasamy, Kirthevasan
Dasarathy, Gautam
Schneider, Jeff
Poczos, Barnabas
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[42] A Satisficing Strategy with Variable Reference in the Multi-armed Bandit Problems
Kohno, Yu
Takahashi, Tatsuji
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE OF NUMERICAL ANALYSIS AND APPLIED MATHEMATICS 2014 (ICNAAM-2014), 2015, 1648
[43] An asymptotically optimal strategy for constrained multi-armed bandit problems
Chang, Hyeong Soo
MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2020, 91 (03) : 545 - 557
[44] A Multi-Armed Bandit Selection Strategy for Hyper-heuristics
Ferreira, Alexandre Silvestre
Goncalves, Richard Aderbal
Pozo, Aurora
2017 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2017, : 525 - 532
[45] An asymptotically optimal strategy for constrained multi-armed bandit problems
Hyeong Soo Chang
Mathematical Methods of Operations Research, 2020, 91 : 545 - 557
[46] The non-stationary stochastic multi-armed bandit problem
Allesiardo R.
Féraud R.
Maillard O.-A.
Allesiardo, Robin (robin.allesiardo@gmail.com), 1600, Springer Science and Business Media Deutschland GmbH (03): : 267 - 283
[47] Multi-Agent Multi-Armed Bandit Learning for Offloading Delay Minimization in V2X Networks
Nang Hung Nguyen
Phi Le Nguyen
Hieu Dinh
Thanh Hung Nguyen
Kien Nguyen
2021 IEEE 19TH INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (EUC 2021), 2021, : 47 - 55
[48] Intelligent Load Balancing and Resource Allocation in O-RAN: A Multi-Agent Multi-Armed Bandit Approach
Lai, Chia-Hsiang
Shen, Li-Hsiang
Feng, Kai-Ten
2023 IEEE 34TH ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR AND MOBILE RADIO COMMUNICATIONS, PIMRC, 2023,
[49] Scaling Multi-Armed Bandit Algorithms
Fouche, Edouard
Komiyama, Junpei
Boehm, Klemens
KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 1449 - 1459
[50] Multi-armed bandit for the cyclic minimum sitting arrangement problem
Robles, Marcos
Cavero, Sergio
Pardo, Eduardo G.
Cordon, Oscar
COMPUTERS & OPERATIONS RESEARCH, 2025, 179

← 1 2 3 4 5 →