Regret bounds for sleeping experts and bandits

被引：99

作者：

Kleinberg, Robert ^{[1
]}

Niculescu-Mizil, Alexandru ^{[1
,2
]}

Sharma, Yogeshwer ^{[1
]}

机构：

[1] Cornell Univ, Dept Comp Sci, Ithaca, NY 14853 USA

[2] IBM Corp, TJ Watson Res Ctr, Dept Math Sci, Yorktown Hts, NY 10598 USA

来源：

MACHINE LEARNING | 2010年 / 80卷 / 2-3期

基金：

美国国家科学基金会;

关键词：

Online algorithms; Computational learning theory; Regret; GAME;

D O I：

10.1007/s10994-010-5178-7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study on-line decision problems where the set of actions that are available to the decision algorithm varies over time. With a few notable exceptions, such problems remained largely unaddressed in the literature, despite their applicability to a large number of practical problems. Departing from previous work on this "Sleeping Experts" problem, we compare algorithms against the payoff obtained by the best ordering of the actions, which is a natural benchmark for this type of problem. We study both the full-information (best expert) and partial-information (multi-armed bandit) settings and consider both stochastic and adversarial rewards models. For all settings we give algorithms achieving (almost) information-theoretically optimal regret bounds (up to a constant or a sub-logarithmic factor) with respect to the best-ordering benchmark.

引用

页码：245 / 272

页数：28

共 50 条

[1] Regret bounds for sleeping experts and bandits
Robert Kleinberg
Alexandru Niculescu-Mizil
Yogeshwer Sharma
[J]. Machine Learning, 2010, 80 : 245 - 272
[2] Regret Bounds for Batched Bandits
Esfandiari, Hossein
Karbasi, Amin
Mehrabian, Abbas
Mirrokni, Vahab
[J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7340 - 7348
[3] Regret bounds for restless Markov bandits
Ortner, Ronald
Ryabko, Daniil
Auer, Peter
Munos, Remi
[J]. THEORETICAL COMPUTER SCIENCE, 2014, 558 : 62 - 76
[4] Bandits with Stochastic Experts: Constant Regret, Empirical Experts and Episodes
Sharma, Nihal
Sen, Rajat
Basu, Soumya
Shanmugam, Karthikeyan
Shakkottai, Sanjay
[J]. ACM TRANSACTIONS ON MODELING AND PERFORMANCE EVALUATION OF COMPUTING SYSTEMS, 2024, 9 (03)
[5] Optimal Regret Bounds for Collaborative Learning in Bandits
Shidani, Amitis
Vakili, Sattar
[J]. INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 237, 2024, 237
[6] Hybrid Regret Bounds for Combinatorial Semi-Bandits and Adversarial Linear Bandits
Ito, Shinji
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[7] Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms
Combes, Richard
Proutiere, Alexandre
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32
[8] On Information Gain and Regret Bounds in Gaussian Process Bandits
Vakili, Sattar
Khezeli, Kia
Picheny, Victor
[J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130 : 82 - +
[9] Improved Path-length Regret Bounds for Bandits
Bubeck, Sebastien
Li, Yuanzhi
Luo, Haipeng
Wei, Chen-Yu
[J]. CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
[10] Improved Regret Bounds for Tracking Experts with Memory
Robinson, James
Herbster, Mark
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34

← 1 2 3 4 5 →