Comparing Multi-Armed Bandit Algorithms and Q-learning for Multiagent Action Selection: a Case Study in Route Choice

被引:0
|
作者
de Oliveira, Thiago B. F. [1 ]
Bazzan, Ana L. C. [1 ]
da Silva, Bruno C. [1 ]
Grunitzki, Ricardo [1 ]
机构
[1] Univ Fed Rio Grande do Sul, Inst Informat, Porto Alegre, RS, Brazil
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The multi-armed bandit (MAB) problem is concerned with an agent choosing which arm of a slot machine to play in order to optimize its reward. A family of reinforcement learning algorithms exists to tackle this problem, including a few variants that consider more than one agent (thus, characterizing a repeated game) and non-stationary variants. In this paper, we seek to evaluate the performance of some of these MAB algorithms and compare them with Q-learning when applied to a non-stationary repeated game, where commuter agents face the task of learning how to choose a route that minimizes their travel times.
引用
收藏
页数:8
相关论文
共 25 条
  • [1] CONTEXTUAL MULTI-ARMED BANDIT ALGORITHMS FOR PERSONALIZED LEARNING ACTION SELECTION
    Manickam, Indu
    Lan, Andrew S.
    Baraniuk, Richard G.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 6344 - 6348
  • [2] Multi-armed Bandit Algorithms for Adaptive Learning: A Survey
    Mui, John
    Lin, Fuhua
    Dewan, M. Ali Akber
    [J]. ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2021), PT II, 2021, 12749 : 273 - 278
  • [3] Multiagent Multi-Armed Bandit Schemes for Gateway Selection in UAV Networks
    Hashima, Sherief
    Hatano, Kohei
    Mohamed, Ehab Mahmoud
    [J]. 2020 IEEE GLOBECOM WORKSHOPS (GC WKSHPS), 2020,
  • [4] Multi-Armed Bandit On-Time Arrival Algorithms for Sequential Reliable Route Selection under Uncertainty
    Zhou, Jinkai
    Lai, Xuebo
    Chow, Joseph Y. J.
    [J]. TRANSPORTATION RESEARCH RECORD, 2019, 2673 (10) : 673 - 682
  • [5] Learning State Selection for Reconfigurable Antennas: A Multi-Armed Bandit Approach
    Gulati, Nikhil
    Dandekar, Kapil R.
    [J]. IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, 2014, 62 (03) : 1027 - 1038
  • [6] Automated Collaborator Selection for Federated Learning with Multi-armed Bandit Agents
    Larsson, Hannes
    Riaz, Hassam
    Ickin, Selim
    [J]. PROCEEDINGS OF THE 4TH FLEXNETS WORKSHOP ON FLEXIBLE NETWORKS, ARTIFICIAL INTELLIGENCE SUPPORTED NETWORK FLEXIBILITY AND AGILITY (FLEXNETS'21), 2021, : 44 - 49
  • [7] Gorthaur : A Portfolio Approach for Dynamic Selection of Multi-Armed Bandit Algorithms for Recommendation
    Gutowski, Nicolas
    Amghar, Tassadit
    Camp, Olivier
    Chhel, Fabien
    [J]. 2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 1164 - 1171
  • [8] LEARNING ALGORITHMS FOR ENERGY-EFFICIENT MIMO ANTENNA SUBSET SELECTION: MULTI-ARMED BANDIT FRAMEWORK
    Mukherjee, Amitav
    Hottinen, Ari
    [J]. 2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 659 - 663
  • [9] Addictive Games: Case Study on Multi-Armed Bandit Game
    Kang, Xiaohan
    Ri, Hong
    Khalid, Mohd Nor Akmal
    Iida, Hiroyuki
    [J]. INFORMATION, 2021, 12 (12)
  • [10] HAMLET - A Learning Curve-Enabled Multi-Armed Bandit for Algorithm Selection
    Schmidt, Mischa
    Gastinger, Julia
    Nicolas, Sebastien
    Schuelke, Anett
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,