Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems

被引：0

作者：

Even-Dar, Eyal

Mannor, Shie

Mansour, Yishay

机构：

[1] Univ Penn, Dept Comp & Informat Sci, Philadelphia, PA 19104 USA

[2] McGill Univ, Dept Elect & Comp Engn, Montreal, PQ H3A 2A7, Canada

[3] Tel Aviv Univ, Sch Comp Sci, IL-69978 Tel Aviv, Israel

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2006年 / 7卷

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We incorporate statistical confidence intervals in both the multi-armed bandit and the reinforcement learning problems. In the bandit problem we show that given n arms, it suffices to pull the arms a total of O((n/epsilon(2)) log(1/delta)) times to find an epsilon-optimal arm with probability of at least 1-delta. This bound matches the lower bound of Mannor and Tsitsiklis (2004) up to constants. We also devise action elimination procedures in reinforcement learning algorithms. We describe a framework that is based on learning the confidence interval around the value function or the Q-function and eliminating actions that are not optimal (with high probability). We provide a model-based and a model-free variants of the elimination method. We further derive stopping conditions guaranteeing that the learned policy is approximately optimal with high probability. Simulations demonstrate a considerable speedup and added robustness over epsilon-greedy Q-learning.

引用

页码：1079 / 1105

页数：27

共 50 条

[21] Adaptive Active Learning as a Multi-armed Bandit Problem
Czarnecki, Wojciech M.
Podolak, Igor T.
21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 989 - 990
[22] Multi-armed Bandit Algorithms for Adaptive Learning: A Survey
Mui, John
Lin, Fuhua
Dewan, M. Ali Akber
ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2021), PT II, 2021, 12749 : 273 - 278
[23] Transfer Learning in Multi-Armed Bandit: A Causal Approach
Zhang, Junzhe
Bareinboim, Elias
AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 1778 - 1780
[24] Distributed Learning in Multi-Armed Bandit With Multiple Players
Liu, Keqin
Zhao, Qing
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2010, 58 (11) : 5667 - 5681
[25] A Satisficing Strategy with Variable Reference in the Multi-armed Bandit Problems
Kohno, Yu
Takahashi, Tatsuji
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE OF NUMERICAL ANALYSIS AND APPLIED MATHEMATICS 2014 (ICNAAM-2014), 2015, 1648
[26] GAUSSIAN PROCESS MODELLING OF DEPENDENCIES IN MULTI-ARMED BANDIT PROBLEMS
Dorard, Louis
Glowacka, Dorota
Shawe-Taylor, John
PROCEEDINGS OF THE 10TH INTERNATIONAL SYMPOSIUM ON OPERATIONAL RESEARCH SOR 09, 2009, : 77 - 84
[27] Time-Varying Stochastic Multi-Armed Bandit Problems
Vakili, Sattar
Zhao, Qing
Zhou, Yuan
CONFERENCE RECORD OF THE 2014 FORTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2014, : 2103 - 2107
[28] Synchronization and optimality for multi-armed bandit problems in continuous time
ElKaroui, N
Karatzas, I
COMPUTATIONAL & APPLIED MATHEMATICS, 1997, 16 (02): : 117 - 151
[29] An asymptotically optimal strategy for constrained multi-armed bandit problems
Chang, Hyeong Soo
MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2020, 91 (03) : 545 - 557
[30] Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems
Vakili, Sattar
Liu, Keqin
Zhao, Qing
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2013, 7 (05) : 759 - 767

← 1 2 3 4 5 →