Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems

被引：0

作者：

Even-Dar, Eyal

Mannor, Shie

Mansour, Yishay

机构：

[1] Univ Penn, Dept Comp & Informat Sci, Philadelphia, PA 19104 USA

[2] McGill Univ, Dept Elect & Comp Engn, Montreal, PQ H3A 2A7, Canada

[3] Tel Aviv Univ, Sch Comp Sci, IL-69978 Tel Aviv, Israel

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2006年 / 7卷

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We incorporate statistical confidence intervals in both the multi-armed bandit and the reinforcement learning problems. In the bandit problem we show that given n arms, it suffices to pull the arms a total of O((n/epsilon(2)) log(1/delta)) times to find an epsilon-optimal arm with probability of at least 1-delta. This bound matches the lower bound of Mannor and Tsitsiklis (2004) up to constants. We also devise action elimination procedures in reinforcement learning algorithms. We describe a framework that is based on learning the confidence interval around the value function or the Q-function and eliminating actions that are not optimal (with high probability). We provide a model-based and a model-free variants of the elimination method. We further derive stopping conditions guaranteeing that the learned policy is approximately optimal with high probability. Simulations demonstrate a considerable speedup and added robustness over epsilon-greedy Q-learning.

引用

页码：1079 / 1105

页数：27

共 50 条

[31] The Effect of Communication on Noncooperative Multiplayer Multi-Armed Bandit Problems
Evirgen, Noyan
Kose, Alper
2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 331 - 336
[32] An asymptotically optimal strategy for constrained multi-armed bandit problems
Hyeong Soo Chang
Mathematical Methods of Operations Research, 2020, 91 : 545 - 557
[33] On the Optimality of Perturbations in Stochastic and Adversarial Multi-armed Bandit Problems
Kim, Baekjin
Tewari, Ambuj
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[34] Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
Bubeck, Sebastien
Cesa-Bianchi, Nicolo
FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2012, 5 (01): : 1 - 122
[35] Dynamic Multi-Armed Bandit with Covariates
Pavlidis, Nicos G.
Tasoulis, Dimitris K.
Adams, Niall M.
Hand, David J.
ECAI 2008, PROCEEDINGS, 2008, 178 : 777 - +
[36] Scaling Multi-Armed Bandit Algorithms
Fouche, Edouard
Komiyama, Junpei
Boehm, Klemens
KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 1449 - 1459
[37] The budgeted multi-armed bandit problem
Madani, O
Lizotte, DJ
Greiner, R
LEARNING THEORY, PROCEEDINGS, 2004, 3120 : 643 - 645
[38] The Multi-Armed Bandit With Stochastic Plays
Lesage-Landry, Antoine
Taylor, Joshua A.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2018, 63 (07) : 2280 - 2286
[39] Achieving Regular and Fair Learning in Combinatorial Multi-Armed Bandit
Wu, Xiaoyi
Li, Bin
IEEE INFOCOM 2024-IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2024, : 361 - 370
[40] Multi-armed Bandit with Additional Observations
Yun, Donggyu
Proutiere, Alexandre
Ahn, Sumyeong
Shin, Jinwoo
Yi, Yung
PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2018, 2 (01)

← 1 2 3 4 5 →