Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems

被引:0
|
作者
Even-Dar, Eyal
Mannor, Shie
Mansour, Yishay
机构
[1] Univ Penn, Dept Comp & Informat Sci, Philadelphia, PA 19104 USA
[2] McGill Univ, Dept Elect & Comp Engn, Montreal, PQ H3A 2A7, Canada
[3] Tel Aviv Univ, Sch Comp Sci, IL-69978 Tel Aviv, Israel
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We incorporate statistical confidence intervals in both the multi-armed bandit and the reinforcement learning problems. In the bandit problem we show that given n arms, it suffices to pull the arms a total of O((n/epsilon(2)) log(1/delta)) times to find an epsilon-optimal arm with probability of at least 1-delta. This bound matches the lower bound of Mannor and Tsitsiklis (2004) up to constants. We also devise action elimination procedures in reinforcement learning algorithms. We describe a framework that is based on learning the confidence interval around the value function or the Q-function and eliminating actions that are not optimal (with high probability). We provide a model-based and a model-free variants of the elimination method. We further derive stopping conditions guaranteeing that the learned policy is approximately optimal with high probability. Simulations demonstrate a considerable speedup and added robustness over epsilon-greedy Q-learning.
引用
收藏
页码:1079 / 1105
页数:27
相关论文
共 50 条
  • [41] IMPROVING STRATEGIES FOR THE MULTI-ARMED BANDIT
    POHLENZ, S
    MARKOV PROCESS AND CONTROL THEORY, 1989, 54 : 158 - 163
  • [42] MULTI-ARMED BANDIT ALLOCATION INDEXES
    JONES, PW
    JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 1989, 40 (12) : 1158 - 1159
  • [43] Multi-armed bandit heterogeneous ensemble learning for imbalanced data
    Dai, Qi
    Liu, Jian-wei
    Yang, Jiapeng
    COMPUTATIONAL INTELLIGENCE, 2023, 39 (02) : 344 - 368
  • [44] Learning the Truth in Social Networks Using Multi-Armed Bandit
    Odeyomi, Olusola T.
    IEEE ACCESS, 2020, 8 : 137692 - 137701
  • [45] Multi-Armed Bandit Learning for Content Provisioning in Network of UAVs
    Bhuyan, Amit Kumar
    Dutta, Hrishikesh
    Biswas, Subir
    IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 1143 - 1148
  • [46] THE MULTI-ARMED BANDIT PROBLEM WITH COVARIATES
    Perchet, Vianney
    Rigollet, Philippe
    ANNALS OF STATISTICS, 2013, 41 (02): : 693 - 721
  • [47] Quantum Reinforcement Learning for Multi-Armed Bandits
    Liu, Yi-Pei
    Li, Kuo
    Cao, Xi
    Jia, Qing-Shan
    Wang, Xu
    2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 5675 - 5680
  • [48] The Multi-fidelity Multi-armed Bandit
    Kandasamy, Kirthevasan
    Dasarathy, Gautam
    Schneider, Jeff
    Poczos, Barnabas
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [49] Multi-armed Bandit with Additional Observations
    Yun D.
    Ahn S.
    Proutiere A.
    Shin J.
    Yi Y.
    2018, Association for Computing Machinery, 2 Penn Plaza, Suite 701, New York, NY 10121-0701, United States (46): : 53 - 55
  • [50] Burst-induced Multi-Armed Bandit for Learning Recommendation
    Alves, Rodrigo
    Ledent, Antoine
    Kloft, Marius
    15TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS 2021), 2021, : 292 - 301