No-Regret Linear Bandits beyond Realizability

被引:0
|
作者
Liu, Chong [1 ]
Yin, Ming [1 ]
Wang, Yu-Xiang [1 ]
机构
[1] Univ Calif Santa Barbara, Dept Comp Sci, Santa Barbara, CA 93106 USA
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study linear bandits when the underlying reward function is not linear. Existing work relies on a uniform misspecification parameter epsilon that measures the sup-norm error of the best linear approximation. This results in an unavoidable linear regret whenever epsilon > 0. We describe a more natural model of misspecification which only requires the approximation error at each input x to be proportional to the suboptimality gap at x. It captures the intuition that, for optimization problems, near-optimal regions should matter more and we can tolerate larger approximation errors in suboptimal regions. Quite surprisingly, we show that the classical Lin-UCB algorithm - designed for the realizable case - is automatically robust against such gap-adjusted misspecification. It achieves a near-optimal root T regret for problems that the best-known regret is almost linear in time horizon T. Technically, our proof relies on a novel self-bounding argument that bounds the part of the regret due to misspecification by the regret itself.
引用
收藏
页码:1294 / 1303
页数:10
相关论文
共 50 条
  • [21] A wide range no-regret theorem
    Lehrer, E
    GAMES AND ECONOMIC BEHAVIOR, 2003, 42 (01) : 101 - 115
  • [22] No-Regret Slice Reservation Algorithms
    Monteil, Jean-Baptiste
    Iosifidis, George
    DaSilva, Luiz
    IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2021), 2021,
  • [23] No-regret dynamics and fictitious play
    Viossat, Yannick
    Zapechelnyuk, Andriy
    JOURNAL OF ECONOMIC THEORY, 2013, 148 (02) : 825 - 842
  • [24] No-Regret Learning in Bayesian Games
    Hartline, Jason
    Syrgkanis, Vasilis
    Tardos, Eva
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [25] Strategizing against No-regret Learners
    Deng, Yuan
    Schneider, Jon
    Sivan, Balasubramanian
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [26] Regret of Queueing Bandits
    Krishnasamy, Subhashini
    Sen, Rajat
    Johari, Ramesh
    Shakkottai, Sanjay
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [27] On Fixed Convex Combinations of No-Regret Learners
    Calliess, Jan-P.
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, 2009, 5632 : 494 - 504
  • [28] No-Regret Learning Supports Voters' Competence
    Spelda, Petr
    Stritecky, Vit
    Symons, John
    SOCIAL EPISTEMOLOGY, 2024, 38 (05) : 543 - 559
  • [29] Opportunistic Approachability and Generalized No-Regret Problems
    Bernstein, Andrey
    Mannor, Shie
    Shimkin, Nahum
    MATHEMATICS OF OPERATIONS RESEARCH, 2014, 39 (04) : 1057 - 1083
  • [30] Limits and limitations of no-regret learning in games
    Monnot, Barnabe
    Piliouras, Georgios
    KNOWLEDGE ENGINEERING REVIEW, 2017, 32