A Dynamic Observation Strategy for Multi-agent Multi-armed Bandit Problem

被引:13
|
作者
Madhushani, Udari [1 ]
Leonard, Naomi Ehrich [1 ]
机构
[1] Princeton Univ, Dept Mech & Aerosp Engn, Princeton, NJ 08544 USA
关键词
ALLOCATION;
D O I
10.23919/ecc51009.2020.9143736
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We define and analyze a multi-agent multi-armed bandit problem in which decision-making agents can observe the choices and rewards of their neighbors under a linear observation cost. Neighbors are defined by a network graph that encodes the inherent observation constraints of the system. We define a cost associated with observations such that at every instance an agent makes an observation it receives a constant observation regret. We design a sampling algorithm and an observation protocol for each agent to maximize its own expected cumulative reward through minimizing expected cumulative sampling regret and expected cumulative observation regret. For our proposed protocol, we prove that total cumulative regret is logarithmically bounded. We verify the accuracy of analytical bounds using numerical simulations.
引用
收藏
页码:1677 / 1682
页数:6
相关论文
共 50 条
  • [41] The Multi-fidelity Multi-armed Bandit
    Kandasamy, Kirthevasan
    Dasarathy, Gautam
    Schneider, Jeff
    Poczos, Barnabas
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [42] A Satisficing Strategy with Variable Reference in the Multi-armed Bandit Problems
    Kohno, Yu
    Takahashi, Tatsuji
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE OF NUMERICAL ANALYSIS AND APPLIED MATHEMATICS 2014 (ICNAAM-2014), 2015, 1648
  • [44] A Multi-Armed Bandit Selection Strategy for Hyper-heuristics
    Ferreira, Alexandre Silvestre
    Goncalves, Richard Aderbal
    Pozo, Aurora
    2017 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2017, : 525 - 532
  • [45] An asymptotically optimal strategy for constrained multi-armed bandit problems
    Hyeong Soo Chang
    Mathematical Methods of Operations Research, 2020, 91 : 545 - 557
  • [46] The non-stationary stochastic multi-armed bandit problem
    Allesiardo R.
    Féraud R.
    Maillard O.-A.
    Allesiardo, Robin (robin.allesiardo@gmail.com), 1600, Springer Science and Business Media Deutschland GmbH (03): : 267 - 283
  • [47] Multi-Agent Multi-Armed Bandit Learning for Offloading Delay Minimization in V2X Networks
    Nang Hung Nguyen
    Phi Le Nguyen
    Hieu Dinh
    Thanh Hung Nguyen
    Kien Nguyen
    2021 IEEE 19TH INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (EUC 2021), 2021, : 47 - 55
  • [48] Intelligent Load Balancing and Resource Allocation in O-RAN: A Multi-Agent Multi-Armed Bandit Approach
    Lai, Chia-Hsiang
    Shen, Li-Hsiang
    Feng, Kai-Ten
    2023 IEEE 34TH ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR AND MOBILE RADIO COMMUNICATIONS, PIMRC, 2023,
  • [49] Scaling Multi-Armed Bandit Algorithms
    Fouche, Edouard
    Komiyama, Junpei
    Boehm, Klemens
    KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 1449 - 1459
  • [50] Multi-armed bandit for the cyclic minimum sitting arrangement problem
    Robles, Marcos
    Cavero, Sergio
    Pardo, Eduardo G.
    Cordon, Oscar
    COMPUTERS & OPERATIONS RESEARCH, 2025, 179