Regret bounds for sleeping experts and bandits

被引:0
|
作者
Robert Kleinberg
Alexandru Niculescu-Mizil
Yogeshwer Sharma
机构
[1] Cornell University,Department of Computer Science
[2] IBM T.J. Watson Research Center,Mathematical Sciences Department
来源
Machine Learning | 2010年 / 80卷
关键词
Online algorithms; Computational learning theory; Regret;
D O I
暂无
中图分类号
学科分类号
摘要
We study on-line decision problems where the set of actions that are available to the decision algorithm varies over time. With a few notable exceptions, such problems remained largely unaddressed in the literature, despite their applicability to a large number of practical problems. Departing from previous work on this “Sleeping Experts” problem, we compare algorithms against the payoff obtained by the best ordering of the actions, which is a natural benchmark for this type of problem. We study both the full-information (best expert) and partial-information (multi-armed bandit) settings and consider both stochastic and adversarial rewards models. For all settings we give algorithms achieving (almost) information-theoretically optimal regret bounds (up to a constant or a sub-logarithmic factor) with respect to the best-ordering benchmark.
引用
收藏
页码:245 / 272
页数:27
相关论文
共 50 条
  • [1] Regret bounds for sleeping experts and bandits
    Kleinberg, Robert
    Niculescu-Mizil, Alexandru
    Sharma, Yogeshwer
    [J]. MACHINE LEARNING, 2010, 80 (2-3) : 245 - 272
  • [2] Regret Bounds for Batched Bandits
    Esfandiari, Hossein
    Karbasi, Amin
    Mehrabian, Abbas
    Mirrokni, Vahab
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7340 - 7348
  • [3] Regret bounds for restless Markov bandits
    Ortner, Ronald
    Ryabko, Daniil
    Auer, Peter
    Munos, Remi
    [J]. THEORETICAL COMPUTER SCIENCE, 2014, 558 : 62 - 76
  • [4] Bandits with Stochastic Experts: Constant Regret, Empirical Experts and Episodes
    Sharma, Nihal
    Sen, Rajat
    Basu, Soumya
    Shanmugam, Karthikeyan
    Shakkottai, Sanjay
    [J]. ACM TRANSACTIONS ON MODELING AND PERFORMANCE EVALUATION OF COMPUTING SYSTEMS, 2024, 9 (03)
  • [5] Optimal Regret Bounds for Collaborative Learning in Bandits
    Shidani, Amitis
    Vakili, Sattar
    [J]. INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 237, 2024, 237
  • [6] Hybrid Regret Bounds for Combinatorial Semi-Bandits and Adversarial Linear Bandits
    Ito, Shinji
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [7] Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms
    Combes, Richard
    Proutiere, Alexandre
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32
  • [8] On Information Gain and Regret Bounds in Gaussian Process Bandits
    Vakili, Sattar
    Khezeli, Kia
    Picheny, Victor
    [J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130 : 82 - +
  • [9] Improved Path-length Regret Bounds for Bandits
    Bubeck, Sebastien
    Li, Yuanzhi
    Luo, Haipeng
    Wei, Chen-Yu
    [J]. CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
  • [10] Improved Regret Bounds for Tracking Experts with Memory
    Robinson, James
    Herbster, Mark
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34