Regret bounds for sleeping experts and bandits

被引：0

作者：

Robert Kleinberg

Alexandru Niculescu-Mizil

Yogeshwer Sharma

机构：

[1] Cornell University,Department of Computer Science

[2] IBM T.J. Watson Research Center,Mathematical Sciences Department

来源：

Machine Learning | 2010年 / 80卷

关键词：

Online algorithms; Computational learning theory; Regret;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

We study on-line decision problems where the set of actions that are available to the decision algorithm varies over time. With a few notable exceptions, such problems remained largely unaddressed in the literature, despite their applicability to a large number of practical problems. Departing from previous work on this “Sleeping Experts” problem, we compare algorithms against the payoff obtained by the best ordering of the actions, which is a natural benchmark for this type of problem. We study both the full-information (best expert) and partial-information (multi-armed bandit) settings and consider both stochastic and adversarial rewards models. For all settings we give algorithms achieving (almost) information-theoretically optimal regret bounds (up to a constant or a sub-logarithmic factor) with respect to the best-ordering benchmark.

引用

页码：245 / 272

页数：27

共 50 条

[1] Regret bounds for sleeping experts and bandits
Kleinberg, Robert
Niculescu-Mizil, Alexandru
Sharma, Yogeshwer
[J]. MACHINE LEARNING, 2010, 80 (2-3) : 245 - 272
[2] Regret Bounds for Batched Bandits
Esfandiari, Hossein
Karbasi, Amin
Mehrabian, Abbas
Mirrokni, Vahab
[J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7340 - 7348
[3] Regret bounds for restless Markov bandits
Ortner, Ronald
Ryabko, Daniil
Auer, Peter
Munos, Remi
[J]. THEORETICAL COMPUTER SCIENCE, 2014, 558 : 62 - 76
[4] Bandits with Stochastic Experts: Constant Regret, Empirical Experts and Episodes
Sharma, Nihal
Sen, Rajat
Basu, Soumya
Shanmugam, Karthikeyan
Shakkottai, Sanjay
[J]. ACM TRANSACTIONS ON MODELING AND PERFORMANCE EVALUATION OF COMPUTING SYSTEMS, 2024, 9 (03)
[5] Optimal Regret Bounds for Collaborative Learning in Bandits
Shidani, Amitis
Vakili, Sattar
[J]. INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 237, 2024, 237
[6] Hybrid Regret Bounds for Combinatorial Semi-Bandits and Adversarial Linear Bandits
Ito, Shinji
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[7] Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms
Combes, Richard
Proutiere, Alexandre
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32
[8] On Information Gain and Regret Bounds in Gaussian Process Bandits
Vakili, Sattar
Khezeli, Kia
Picheny, Victor
[J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130 : 82 - +
[9] Improved Path-length Regret Bounds for Bandits
Bubeck, Sebastien
Li, Yuanzhi
Luo, Haipeng
Wei, Chen-Yu
[J]. CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
[10] Improved Regret Bounds for Tracking Experts with Memory
Robinson, James
Herbster, Mark
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34

← 1 2 3 4 5 →