No-Regret Linear Bandits beyond Realizability

被引:0
|
作者
Liu, Chong [1 ]
Yin, Ming [1 ]
Wang, Yu-Xiang [1 ]
机构
[1] Univ Calif Santa Barbara, Dept Comp Sci, Santa Barbara, CA 93106 USA
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study linear bandits when the underlying reward function is not linear. Existing work relies on a uniform misspecification parameter epsilon that measures the sup-norm error of the best linear approximation. This results in an unavoidable linear regret whenever epsilon > 0. We describe a more natural model of misspecification which only requires the approximation error at each input x to be proportional to the suboptimality gap at x. It captures the intuition that, for optimization problems, near-optimal regions should matter more and we can tolerate larger approximation errors in suboptimal regions. Quite surprisingly, we show that the classical Lin-UCB algorithm - designed for the realizable case - is automatically robust against such gap-adjusted misspecification. It achieves a near-optimal root T regret for problems that the best-known regret is almost linear in time horizon T. Technically, our proof relies on a novel self-bounding argument that bounds the part of the regret due to misspecification by the regret itself.
引用
收藏
页码:1294 / 1303
页数:10
相关论文
共 50 条
  • [1] No-Regret Algorithms for Heavy-Tailed Linear Bandits
    Medina, Andres Munoz
    Yang, Scott
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [2] An α-No-Regret Algorithm For Graphical Bilinear Bandits
    Rizk, Geovani
    Colin, Igor
    Thomas, Albert
    Laraki, Rida
    Chevaleyre, Yann
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [3] No-regret learning for repeated concave games with lossy bandits
    Liu, Wenting
    Lei, Jinlong
    Yi, Peng
    [J]. 2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 936 - 941
  • [4] Tractable contextual bandits beyond realizability
    Krishnamurthy, Sanath Kumar
    Hadad, Vitor
    Athey, Susan
    [J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [5] No-regret learning for repeated non-cooperative games with lossy bandits
    Liu, Wenting
    Lei, Jinlong
    Yi, Peng
    Hong, Yiguang
    [J]. AUTOMATICA, 2024, 160
  • [6] Mechanisms for a No-Regret Agent: Beyond the Common Prior
    Camara, Modibo K.
    Hartline, Jason D.
    Johnsen, Aleck
    [J]. 2020 IEEE 61ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS 2020), 2020, : 259 - 270
  • [7] Nash Regret Guarantees for Linear Bandits
    Sawarni, Ayush
    Pal, Soumyabrata
    Barman, Siddharth
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [8] Memory-Constrained No-Regret Learning in Adversarial Multi-Armed Bandits
    Xu, Xiao
    Zhao, Qing
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 : 2371 - 2382
  • [9] Experimental Design for Regret Minimization in Linear Bandits
    Wagenmaker, Andrew
    Katz-Samuels, Julian
    Jamieson, Kevin
    [J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [10] No-regret boosting
    Gambin, Anna
    Szczurek, Ewa
    [J]. ADAPTIVE AND NATURAL COMPUTING ALGORITHMS, PT 1, 2007, 4431 : 422 - +