Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems

被引:0
|
作者
Even-Dar, Eyal
Mannor, Shie
Mansour, Yishay
机构
[1] Univ Penn, Dept Comp & Informat Sci, Philadelphia, PA 19104 USA
[2] McGill Univ, Dept Elect & Comp Engn, Montreal, PQ H3A 2A7, Canada
[3] Tel Aviv Univ, Sch Comp Sci, IL-69978 Tel Aviv, Israel
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We incorporate statistical confidence intervals in both the multi-armed bandit and the reinforcement learning problems. In the bandit problem we show that given n arms, it suffices to pull the arms a total of O((n/epsilon(2)) log(1/delta)) times to find an epsilon-optimal arm with probability of at least 1-delta. This bound matches the lower bound of Mannor and Tsitsiklis (2004) up to constants. We also devise action elimination procedures in reinforcement learning algorithms. We describe a framework that is based on learning the confidence interval around the value function or the Q-function and eliminating actions that are not optimal (with high probability). We provide a model-based and a model-free variants of the elimination method. We further derive stopping conditions guaranteeing that the learned policy is approximately optimal with high probability. Simulations demonstrate a considerable speedup and added robustness over epsilon-greedy Q-learning.
引用
收藏
页码:1079 / 1105
页数:27
相关论文
共 50 条
  • [21] Adaptive Active Learning as a Multi-armed Bandit Problem
    Czarnecki, Wojciech M.
    Podolak, Igor T.
    21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 989 - 990
  • [22] Multi-armed Bandit Algorithms for Adaptive Learning: A Survey
    Mui, John
    Lin, Fuhua
    Dewan, M. Ali Akber
    ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2021), PT II, 2021, 12749 : 273 - 278
  • [23] Transfer Learning in Multi-Armed Bandit: A Causal Approach
    Zhang, Junzhe
    Bareinboim, Elias
    AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 1778 - 1780
  • [24] Distributed Learning in Multi-Armed Bandit With Multiple Players
    Liu, Keqin
    Zhao, Qing
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2010, 58 (11) : 5667 - 5681
  • [25] A Satisficing Strategy with Variable Reference in the Multi-armed Bandit Problems
    Kohno, Yu
    Takahashi, Tatsuji
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE OF NUMERICAL ANALYSIS AND APPLIED MATHEMATICS 2014 (ICNAAM-2014), 2015, 1648
  • [26] GAUSSIAN PROCESS MODELLING OF DEPENDENCIES IN MULTI-ARMED BANDIT PROBLEMS
    Dorard, Louis
    Glowacka, Dorota
    Shawe-Taylor, John
    PROCEEDINGS OF THE 10TH INTERNATIONAL SYMPOSIUM ON OPERATIONAL RESEARCH SOR 09, 2009, : 77 - 84
  • [27] Time-Varying Stochastic Multi-Armed Bandit Problems
    Vakili, Sattar
    Zhao, Qing
    Zhou, Yuan
    CONFERENCE RECORD OF THE 2014 FORTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2014, : 2103 - 2107
  • [28] Synchronization and optimality for multi-armed bandit problems in continuous time
    ElKaroui, N
    Karatzas, I
    COMPUTATIONAL & APPLIED MATHEMATICS, 1997, 16 (02): : 117 - 151
  • [30] Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems
    Vakili, Sattar
    Liu, Keqin
    Zhao, Qing
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2013, 7 (05) : 759 - 767