Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems

被引：50

作者：

Koulouriotis, D. E. ^{[1
]}

Xanthopoulos, A. ^{[1
]}

机构：

[1] Democritus Univ Thrace, Sch Engn, Dept Prod & Management Engn, Dragana, Greece

来源：

APPLIED MATHEMATICS AND COMPUTATION | 2008年 / 196卷 / 02期

关键词：

decision-making agents; action selection; exploration-exploitation; multi-armed bandit; genetic algorithms; reinforcement learning;

D O I：

10.1016/j.amc.2007.07.043

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

Multi-armed bandit tasks have been extensively used to model the problem of balancing exploitation and exploration. A most challenging variant of the MABP is the non-stationary bandit problem where the agent is faced with the increased complexity of detecting changes in its environment. In this paper we examine a non-stationary, discrete-time, finite horizon bandit problem with a finite number of arms and Gaussian rewards. A family of important ad hoc methods exists that are suitable for non-stationary bandit tasks. These learning algorithms that offer intuition-based solutions to the exploitation-exploration trade-off have the advantage of not relying on strong theoretical assumptions while in the same time can be fine-tuned in order to produce near-optimal results. An entirely different approach to the non-stationary multi-armed bandit problem presents itself in the face of evolutionary algorithms. We present an evolutionary algorithm that was implemented to solve the non-stationary bandit problem along with ad hoc solution algorithms, namely action-value methods with e-greedy and softmax action selection rules, the probability matching method and finally the adaptive pursuit method. A number of simulation-based experiments was conducted and based on the numerical results that we obtained we discuss the methods' performances. (C) 2007 Elsevier Inc. All rights reserved.

引用

页码：913 / 922

页数：10

共 50 条

[1] The non-stationary stochastic multi-armed bandit problem
Allesiardo R.
Féraud R.
Maillard O.-A.
Allesiardo, Robin (robin.allesiardo@gmail.com), 1600, Springer Science and Business Media Deutschland GmbH (03): : 267 - 283
[2] Non-stationary stochastic multi-armed bandit problems with external information on stationarity
Namba H.
Transactions of the Japanese Society for Artificial Intelligence, 2021, 36 (03) : D - K84_1
[3] DYNAMIC SPECTRUM ACCESS WITH NON-STATIONARY MULTI-ARMED BANDIT
Alaya-Feki, Afef Ben Hadj
Moulines, Eric
LeCornec, Alain
2008 IEEE 9TH WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATIONS, VOLS 1 AND 2, 2008, : 416 - 420
[4] Multi-Armed Bandit Learning in IoT Networks: Learning Helps Even in Non-stationary Settings
Bonnefoi, Remi
Besson, Lilian
Moy, Christophe
Kaufmann, Emilie
Palicot, Jacques
COGNITIVE RADIO ORIENTED WIRELESS NETWORKS, 2018, 228 : 173 - 185
[5] Anytime Algorithms for Multi-Armed Bandit Problems
Kleinberg, Robert
PROCEEDINGS OF THE SEVENTHEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2006, : 928 - 936
[6] Foraging decisions as multi-armed bandit problems: Applying reinforcement learning algorithms to foraging data
Morimoto, Juliano
JOURNAL OF THEORETICAL BIOLOGY, 2019, 467 : 48 - 56
[7] Contextual Multi-Armed Bandit With Costly Feature Observation in Non-Stationary Environments
Ghoorchian, Saeed
Kortukov, Evgenii
Maghsudi, Setareh
IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 : 820 - 830
[8] LLM-Informed Multi-Armed Bandit Strategies for Non-Stationary Environments
de Curto, J.
de Zarza, I.
Roig, Gemma
Cano, Juan Carlos
Manzoni, Pietro
Calafate, Carlos T.
ELECTRONICS, 2023, 12 (13)
[9] Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems
Even-Dar, Eyal
Mannor, Shie
Mansour, Yishay
JOURNAL OF MACHINE LEARNING RESEARCH, 2006, 7 : 1079 - 1105
[10] Multi-armed Bandit Algorithms for Adaptive Learning: A Survey
Mui, John
Lin, Fuhua
Dewan, M. Ali Akber
ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2021), PT II, 2021, 12749 : 273 - 278

← 1 2 3 4 5 →