Comparing Multi-Armed Bandit Algorithms and Q-learning for Multiagent Action Selection: a Case Study in Route Choice

被引：0

作者：

de Oliveira, Thiago B. F. ^{[1
]}

Bazzan, Ana L. C. ^{[1
]}

da Silva, Bruno C. ^{[1
]}

Grunitzki, Ricardo ^{[1
]}

机构：

[1] Univ Fed Rio Grande do Sul, Inst Informat, Porto Alegre, RS, Brazil

来源：

2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2018年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The multi-armed bandit (MAB) problem is concerned with an agent choosing which arm of a slot machine to play in order to optimize its reward. A family of reinforcement learning algorithms exists to tackle this problem, including a few variants that consider more than one agent (thus, characterizing a repeated game) and non-stationary variants. In this paper, we seek to evaluate the performance of some of these MAB algorithms and compare them with Q-learning when applied to a non-stationary repeated game, where commuter agents face the task of learning how to choose a route that minimizes their travel times.

引用

页数：8

共 25 条

[1] CONTEXTUAL MULTI-ARMED BANDIT ALGORITHMS FOR PERSONALIZED LEARNING ACTION SELECTION
Manickam, Indu
Lan, Andrew S.
Baraniuk, Richard G.
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 6344 - 6348
[2] Multi-armed Bandit Algorithms for Adaptive Learning: A Survey
Mui, John
Lin, Fuhua
Dewan, M. Ali Akber
[J]. ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2021), PT II, 2021, 12749 : 273 - 278
[3] Multiagent Multi-Armed Bandit Schemes for Gateway Selection in UAV Networks
Hashima, Sherief
Hatano, Kohei
Mohamed, Ehab Mahmoud
[J]. 2020 IEEE GLOBECOM WORKSHOPS (GC WKSHPS), 2020,
[4] Multi-Armed Bandit On-Time Arrival Algorithms for Sequential Reliable Route Selection under Uncertainty
Zhou, Jinkai
Lai, Xuebo
Chow, Joseph Y. J.
[J]. TRANSPORTATION RESEARCH RECORD, 2019, 2673 (10) : 673 - 682
[5] Learning State Selection for Reconfigurable Antennas: A Multi-Armed Bandit Approach
Gulati, Nikhil
Dandekar, Kapil R.
[J]. IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, 2014, 62 (03) : 1027 - 1038
[6] Automated Collaborator Selection for Federated Learning with Multi-armed Bandit Agents
Larsson, Hannes
Riaz, Hassam
Ickin, Selim
[J]. PROCEEDINGS OF THE 4TH FLEXNETS WORKSHOP ON FLEXIBLE NETWORKS, ARTIFICIAL INTELLIGENCE SUPPORTED NETWORK FLEXIBILITY AND AGILITY (FLEXNETS'21), 2021, : 44 - 49
[7] Gorthaur : A Portfolio Approach for Dynamic Selection of Multi-Armed Bandit Algorithms for Recommendation
Gutowski, Nicolas
Amghar, Tassadit
Camp, Olivier
Chhel, Fabien
[J]. 2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 1164 - 1171
[8] LEARNING ALGORITHMS FOR ENERGY-EFFICIENT MIMO ANTENNA SUBSET SELECTION: MULTI-ARMED BANDIT FRAMEWORK
Mukherjee, Amitav
Hottinen, Ari
[J]. 2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 659 - 663
[9] Addictive Games: Case Study on Multi-Armed Bandit Game
Kang, Xiaohan
Ri, Hong
Khalid, Mohd Nor Akmal
Iida, Hiroyuki
[J]. INFORMATION, 2021, 12 (12)
[10] HAMLET - A Learning Curve-Enabled Multi-Armed Bandit for Algorithm Selection
Schmidt, Mischa
Gastinger, Julia
Nicolas, Sebastien
Schuelke, Anett
[J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,

← 1 2 3 →