Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems

被引：0

作者：

Even-Dar, Eyal

Mannor, Shie

Mansour, Yishay

机构：

[1] Univ Penn, Dept Comp & Informat Sci, Philadelphia, PA 19104 USA

[2] McGill Univ, Dept Elect & Comp Engn, Montreal, PQ H3A 2A7, Canada

[3] Tel Aviv Univ, Sch Comp Sci, IL-69978 Tel Aviv, Israel

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2006年 / 7卷

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We incorporate statistical confidence intervals in both the multi-armed bandit and the reinforcement learning problems. In the bandit problem we show that given n arms, it suffices to pull the arms a total of O((n/epsilon(2)) log(1/delta)) times to find an epsilon-optimal arm with probability of at least 1-delta. This bound matches the lower bound of Mannor and Tsitsiklis (2004) up to constants. We also devise action elimination procedures in reinforcement learning algorithms. We describe a framework that is based on learning the confidence interval around the value function or the Q-function and eliminating actions that are not optimal (with high probability). We provide a model-based and a model-free variants of the elimination method. We further derive stopping conditions guaranteeing that the learned policy is approximately optimal with high probability. Simulations demonstrate a considerable speedup and added robustness over epsilon-greedy Q-learning.

引用

页码：1079 / 1105

页数：27

共 50 条

[1] Mechanisms with learning for stochastic multi-armed bandit problems
Shweta Jain
Satyanath Bhat
Ganesh Ghalme
Divya Padmanabhan
Y. Narahari
Indian Journal of Pure and Applied Mathematics, 2016, 47 : 229 - 272
[2] MECHANISMS WITH LEARNING FOR STOCHASTIC MULTI-ARMED BANDIT PROBLEMS
Jain, Shweta
Bhat, Satyanath
Ghalme, Ganesh
Padmanabhan, Divya
Narahari, Y.
INDIAN JOURNAL OF PURE & APPLIED MATHEMATICS, 2016, 47 (02): : 229 - 272
[3] Achieving Complete Learning in Multi-Armed Bandit Problems
Vakili, Sattar
Zhao, Qing
2013 ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, 2013, : 1778 - 1782
[4] Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems
Koulouriotis, D. E.
Xanthopoulos, A.
APPLIED MATHEMATICS AND COMPUTATION, 2008, 196 (02) : 913 - 922
[5] Satisficing in Multi-Armed Bandit Problems
Reverdy, Paul
Srivastava, Vaibhav
Leonard, Naomi Ehrich
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (08) : 3788 - 3803
[6] Foraging decisions as multi-armed bandit problems: Applying reinforcement learning algorithms to foraging data
Morimoto, Juliano
JOURNAL OF THEORETICAL BIOLOGY, 2019, 467 : 48 - 56
[7] Anytime Algorithms for Multi-Armed Bandit Problems
Kleinberg, Robert
PROCEEDINGS OF THE SEVENTHEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2006, : 928 - 936
[8] Percentile optimization in multi-armed bandit problems
Ghatrani, Zahra
Ghate, Archis
ANNALS OF OPERATIONS RESEARCH, 2024, 340 (2-3) : 837 - 862
[9] Ambiguity aversion in multi-armed bandit problems
Anderson, Christopher M.
THEORY AND DECISION, 2012, 72 (01) : 15 - 33
[10] Multi-armed Bandit Problems with Strategic Arms
Braverman, Mark
Mao, Jieming
Schneider, Jon
Weinberg, S. Matthew
CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99

← 1 2 3 4 5 →