A Multi-armed Bandit Algorithm Available in Stationary or Non-stationary Environments Using Self-organizing Maps

被引：1

作者：

Manome, Nobuhito ^{[1
,2
]}

Shinohara, Shuji ^{[2
]}

Suzuki, Kouta ^{[1
,2
]}

Tomonaga, Kosuke ^{[1
,2
]}

Mitsuyoshi, Shunji ^{[2
]}

机构：

[1] SoftBank Robot Grp Corp, Tokyo, Japan

[2] Univ Tokyo, Tokyo, Japan

来源：

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: THEORETICAL NEURAL COMPUTATION, PT I | 2019年 / 11727卷

关键词：

Multi-armed bandit problem; Self-organizing maps; Sequential decision making;

D O I：

10.1007/978-3-030-30487-4_41

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Due to the multitude of potential courses of action, communication robots designed to satisfy the users facing them must take appropriate action more rapidly. In practice however, user requests often change while these robots are determining the most appropriate actions for these users. Therefore, it is difficult for robots to derive an appropriate course of action. This issue has been formalized as the "multi-armed bandit (MAB) problem." The MAB problem points to an environment featuring multiple levers (arms) where pulling an arm has a certain probability of yielding a reward; the issue is to determine how to select the levers to pull to maximize the rewards gained. To solve this problem, we considered a new MAB problem algorithm using self-organizing maps that is adaptable to stationary and non-stationary environments. For this paper, numerous experiments were conducted considering a stochastic MAB problem in both stationary and non-stationary environments. As a result, we determined that the proposed algorithm demonstrated equivalent or improved capability in stationary environments with numerous arms and consistently strong effectiveness in a non-stationary environment compared to the existing UCB1, UCB1-Tuned, and Thompson Sampling algorithms.

引用

页码：529 / 540

页数：12

共 50 条

[31] A self-adaptive communication strategy for flocking in stationary and non-stationary environments
Eliseo Ferrante
Ali Emre Turgut
Alessandro Stranieri
Carlo Pinciroli
Mauro Birattari
Marco Dorigo
Natural Computing, 2014, 13 : 225 - 245
[32] An Online Algorithm for Computation Offloading in Non-Stationary Environments
Rahman, Aniq Ur
Ghatak, Gourab
De Domenico, Antonio
IEEE COMMUNICATIONS LETTERS, 2020, 24 (10) : 2167 - 2171
[33] Improving throughput using multi-armed bandit algorithm for wireless LANs
Kuroda, Kaori
Kato, Hiroki
Kim, Song-Ju
Naruse, Makoto
Hasegawa, Mikio
IEICE NONLINEAR THEORY AND ITS APPLICATIONS, 2018, 9 (01): : 74 - 81
[34] A self-adaptive communication strategy for flocking in stationary and non-stationary environments
Ferrante, Eliseo
Turgut, Ali Emre
Stranieri, Alessandro
Pinciroli, Carlo
Birattari, Mauro
Dorigo, Marco
NATURAL COMPUTING, 2014, 13 (02) : 225 - 245
[35] Multi-Agent Combat in Non-Stationary Environments
Li, Shengang
Chi, Haoang
Xie, Tao
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[36] A noise-estimation algorithm for highly non-stationary environments
Rangachari, S
Loizou, PC
SPEECH COMMUNICATION, 2006, 48 (02) : 220 - 231
[37] Resource Allocation in NOMA-Based Self-Organizing Networks Using Stochastic Multi-Armed Bandits
Youssef, Marie-Josepha
Veeravalli, Venugopal V.
Farah, Joumana
Nour, Charbel Abdel
Douillard, Catherine
IEEE TRANSACTIONS ON COMMUNICATIONS, 2021, 69 (09) : 6003 - 6017
[38] Antenna Parameters Optimization in Self-Organizing Networks: Multi-armed Bandits with Pareto Search
Dhahri, Chaima
Ohtsuki, Tomoaki
2017 IEEE 86TH VEHICULAR TECHNOLOGY CONFERENCE (VTC-FALL), 2017,
[39] Multi-Source Transfer Learning for Non-Stationary Environments
Du, Honghui
Minku, Leandro L.
Zhou, Huiyu
2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
[40] Multi-Agent Data Collection in Non-Stationary Environments
Nguyen, Nhat
Nguyen, Duong
Kim, Junae
Rizzo, Gianluca
Nguyen, Hung
2022 IEEE 23RD INTERNATIONAL SYMPOSIUM ON A WORLD OF WIRELESS, MOBILE AND MULTIMEDIA NETWORKS (WOWMOM 2022), 2022, : 120 - 129

← 1 2 3 4 5 →