Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems

被引:0
|
作者
Even-Dar, Eyal
Mannor, Shie
Mansour, Yishay
机构
[1] Univ Penn, Dept Comp & Informat Sci, Philadelphia, PA 19104 USA
[2] McGill Univ, Dept Elect & Comp Engn, Montreal, PQ H3A 2A7, Canada
[3] Tel Aviv Univ, Sch Comp Sci, IL-69978 Tel Aviv, Israel
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We incorporate statistical confidence intervals in both the multi-armed bandit and the reinforcement learning problems. In the bandit problem we show that given n arms, it suffices to pull the arms a total of O((n/epsilon(2)) log(1/delta)) times to find an epsilon-optimal arm with probability of at least 1-delta. This bound matches the lower bound of Mannor and Tsitsiklis (2004) up to constants. We also devise action elimination procedures in reinforcement learning algorithms. We describe a framework that is based on learning the confidence interval around the value function or the Q-function and eliminating actions that are not optimal (with high probability). We provide a model-based and a model-free variants of the elimination method. We further derive stopping conditions guaranteeing that the learned policy is approximately optimal with high probability. Simulations demonstrate a considerable speedup and added robustness over epsilon-greedy Q-learning.
引用
收藏
页码:1079 / 1105
页数:27
相关论文
共 50 条
  • [1] Mechanisms with learning for stochastic multi-armed bandit problems
    Shweta Jain
    Satyanath Bhat
    Ganesh Ghalme
    Divya Padmanabhan
    Y. Narahari
    Indian Journal of Pure and Applied Mathematics, 2016, 47 : 229 - 272
  • [2] MECHANISMS WITH LEARNING FOR STOCHASTIC MULTI-ARMED BANDIT PROBLEMS
    Jain, Shweta
    Bhat, Satyanath
    Ghalme, Ganesh
    Padmanabhan, Divya
    Narahari, Y.
    INDIAN JOURNAL OF PURE & APPLIED MATHEMATICS, 2016, 47 (02): : 229 - 272
  • [3] Achieving Complete Learning in Multi-Armed Bandit Problems
    Vakili, Sattar
    Zhao, Qing
    2013 ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, 2013, : 1778 - 1782
  • [4] Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems
    Koulouriotis, D. E.
    Xanthopoulos, A.
    APPLIED MATHEMATICS AND COMPUTATION, 2008, 196 (02) : 913 - 922
  • [5] Satisficing in Multi-Armed Bandit Problems
    Reverdy, Paul
    Srivastava, Vaibhav
    Leonard, Naomi Ehrich
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (08) : 3788 - 3803
  • [6] Foraging decisions as multi-armed bandit problems: Applying reinforcement learning algorithms to foraging data
    Morimoto, Juliano
    JOURNAL OF THEORETICAL BIOLOGY, 2019, 467 : 48 - 56
  • [7] Anytime Algorithms for Multi-Armed Bandit Problems
    Kleinberg, Robert
    PROCEEDINGS OF THE SEVENTHEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2006, : 928 - 936
  • [8] Percentile optimization in multi-armed bandit problems
    Ghatrani, Zahra
    Ghate, Archis
    ANNALS OF OPERATIONS RESEARCH, 2024, 340 (2-3) : 837 - 862
  • [9] Ambiguity aversion in multi-armed bandit problems
    Anderson, Christopher M.
    THEORY AND DECISION, 2012, 72 (01) : 15 - 33
  • [10] Multi-armed Bandit Problems with Strategic Arms
    Braverman, Mark
    Mao, Jieming
    Schneider, Jon
    Weinberg, S. Matthew
    CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99