Regret bounds for sleeping experts and bandits

被引:99
|
作者
Kleinberg, Robert [1 ]
Niculescu-Mizil, Alexandru [1 ,2 ]
Sharma, Yogeshwer [1 ]
机构
[1] Cornell Univ, Dept Comp Sci, Ithaca, NY 14853 USA
[2] IBM Corp, TJ Watson Res Ctr, Dept Math Sci, Yorktown Hts, NY 10598 USA
基金
美国国家科学基金会;
关键词
Online algorithms; Computational learning theory; Regret; GAME;
D O I
10.1007/s10994-010-5178-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study on-line decision problems where the set of actions that are available to the decision algorithm varies over time. With a few notable exceptions, such problems remained largely unaddressed in the literature, despite their applicability to a large number of practical problems. Departing from previous work on this "Sleeping Experts" problem, we compare algorithms against the payoff obtained by the best ordering of the actions, which is a natural benchmark for this type of problem. We study both the full-information (best expert) and partial-information (multi-armed bandit) settings and consider both stochastic and adversarial rewards models. For all settings we give algorithms achieving (almost) information-theoretically optimal regret bounds (up to a constant or a sub-logarithmic factor) with respect to the best-ordering benchmark.
引用
收藏
页码:245 / 272
页数:28
相关论文
共 50 条
  • [1] Regret bounds for sleeping experts and bandits
    Robert Kleinberg
    Alexandru Niculescu-Mizil
    Yogeshwer Sharma
    [J]. Machine Learning, 2010, 80 : 245 - 272
  • [2] Regret Bounds for Batched Bandits
    Esfandiari, Hossein
    Karbasi, Amin
    Mehrabian, Abbas
    Mirrokni, Vahab
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7340 - 7348
  • [3] Regret bounds for restless Markov bandits
    Ortner, Ronald
    Ryabko, Daniil
    Auer, Peter
    Munos, Remi
    [J]. THEORETICAL COMPUTER SCIENCE, 2014, 558 : 62 - 76
  • [4] Bandits with Stochastic Experts: Constant Regret, Empirical Experts and Episodes
    Sharma, Nihal
    Sen, Rajat
    Basu, Soumya
    Shanmugam, Karthikeyan
    Shakkottai, Sanjay
    [J]. ACM TRANSACTIONS ON MODELING AND PERFORMANCE EVALUATION OF COMPUTING SYSTEMS, 2024, 9 (03)
  • [5] Optimal Regret Bounds for Collaborative Learning in Bandits
    Shidani, Amitis
    Vakili, Sattar
    [J]. INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 237, 2024, 237
  • [6] Hybrid Regret Bounds for Combinatorial Semi-Bandits and Adversarial Linear Bandits
    Ito, Shinji
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [7] Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms
    Combes, Richard
    Proutiere, Alexandre
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32
  • [8] On Information Gain and Regret Bounds in Gaussian Process Bandits
    Vakili, Sattar
    Khezeli, Kia
    Picheny, Victor
    [J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130 : 82 - +
  • [9] Improved Path-length Regret Bounds for Bandits
    Bubeck, Sebastien
    Li, Yuanzhi
    Luo, Haipeng
    Wei, Chen-Yu
    [J]. CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
  • [10] Improved Regret Bounds for Tracking Experts with Memory
    Robinson, James
    Herbster, Mark
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34