A Multi-armed Bandit Algorithm Available in Stationary or Non-stationary Environments Using Self-organizing Maps

被引：1

作者：

Manome, Nobuhito ^{[1
,2
]}

Shinohara, Shuji ^{[2
]}

Suzuki, Kouta ^{[1
,2
]}

Tomonaga, Kosuke ^{[1
,2
]}

Mitsuyoshi, Shunji ^{[2
]}

机构：

[1] SoftBank Robot Grp Corp, Tokyo, Japan

[2] Univ Tokyo, Tokyo, Japan

来源：

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: THEORETICAL NEURAL COMPUTATION, PT I | 2019年 / 11727卷

关键词：

Multi-armed bandit problem; Self-organizing maps; Sequential decision making;

D O I：

10.1007/978-3-030-30487-4_41

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Due to the multitude of potential courses of action, communication robots designed to satisfy the users facing them must take appropriate action more rapidly. In practice however, user requests often change while these robots are determining the most appropriate actions for these users. Therefore, it is difficult for robots to derive an appropriate course of action. This issue has been formalized as the "multi-armed bandit (MAB) problem." The MAB problem points to an environment featuring multiple levers (arms) where pulling an arm has a certain probability of yielding a reward; the issue is to determine how to select the levers to pull to maximize the rewards gained. To solve this problem, we considered a new MAB problem algorithm using self-organizing maps that is adaptable to stationary and non-stationary environments. For this paper, numerous experiments were conducted considering a stochastic MAB problem in both stationary and non-stationary environments. As a result, we determined that the proposed algorithm demonstrated equivalent or improved capability in stationary environments with numerous arms and consistently strong effectiveness in a non-stationary environment compared to the existing UCB1, UCB1-Tuned, and Thompson Sampling algorithms.

引用

页码：529 / 540

页数：12

共 50 条

[41] EVOLUTIONARY MULTI-AGENT SYSTEMS IN NON-STATIONARY ENVIRONMENTS
Kisiel-Dorohinicki, Marek
COMPUTER SCIENCE-AGH, 2013, 14 (04): : 563 - 575
[42] Detecting Anomalies by using Self-Organizing Maps in Industrial Environments
Hormann, Ricardo
Fischer, Eric
PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS SECURITY AND PRIVACY (ICISSP), 2019, : 336 - 344
[43] Combining stationary wavelet transform and self-organizing maps for brain MR image segmentation
Demirhan, Ayse
Gueler, Inan
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2011, 24 (02) : 358 - 367
[44] A Bayesian Multi-Armed Bandit Algorithm for Dynamic End-to-End Routing in SDN-Based Networks with Piecewise-Stationary Rewards
Santana, Pedro
Moura, Jose
ALGORITHMS, 2023, 16 (05)
[45] A robust policy bootstrapping algorithm for multi-objective reinforcement learning in non-stationary environments
Abdelfattah, Sherif
Kasmarik, Kathryn
Hu, Jiankun
ADAPTIVE BEHAVIOR, 2020, 28 (04) : 273 - 292
[46] Efficient wireless network selection by using multi-armed bandit algorithm for mobile terminals
Oshima, Koji
Onishi, Takuma
Kim, Song-Ju
Ma, Jing
Hasegawa, Mikio
IEICE NONLINEAR THEORY AND ITS APPLICATIONS, 2020, 11 (01): : 68 - 77
[47] Secure Channel Selection Using Multi-Armed Bandit Algorithm in Cognitive Radio Network
Endo, Masahiro
Ohtsuki, Tomoaki
Fujii, Takeo
Takyu, Osamu
2017 IEEE 85TH VEHICULAR TECHNOLOGY CONFERENCE (VTC SPRING), 2017,
[48] A Channel Allocation Algorithm for Cognitive Radio Systems using Restless Multi-armed Bandit
Lee, Hyuk
Lee, Jungwoo
2013 IEEE 78TH VEHICULAR TECHNOLOGY CONFERENCE (VTC FALL), 2013,
[49] Self-localization in non-stationary environments using omni-directional vision
Andreasson, Henrik
Treptow, Andre
Duckett, Tom
ROBOTICS AND AUTONOMOUS SYSTEMS, 2007, 55 (07) : 541 - 551
[50] On some properties of the B-Cell algorithm in non-stationary environments
Trojanowski, Krzysztof
Wierzchon, Slawomir T.
ADVANCES IN INFORMATION PROCESSING AND PROTECTION, 2007, : 35 - 44

← 1 2 3 4 5 →