No-Regret Linear Bandits beyond Realizability

被引：0

作者：

Liu, Chong ^{[1
]}

Yin, Ming ^{[1
]}

Wang, Yu-Xiang ^{[1
]}

机构：

[1] Univ Calif Santa Barbara, Dept Comp Sci, Santa Barbara, CA 93106 USA

来源：

UNCERTAINTY IN ARTIFICIAL INTELLIGENCE | 2023年 / 216卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study linear bandits when the underlying reward function is not linear. Existing work relies on a uniform misspecification parameter epsilon that measures the sup-norm error of the best linear approximation. This results in an unavoidable linear regret whenever epsilon > 0. We describe a more natural model of misspecification which only requires the approximation error at each input x to be proportional to the suboptimality gap at x. It captures the intuition that, for optimization problems, near-optimal regions should matter more and we can tolerate larger approximation errors in suboptimal regions. Quite surprisingly, we show that the classical Lin-UCB algorithm - designed for the realizable case - is automatically robust against such gap-adjusted misspecification. It achieves a near-optimal root T regret for problems that the best-known regret is almost linear in time horizon T. Technically, our proof relies on a novel self-bounding argument that bounds the part of the regret due to misspecification by the regret itself.

引用

页码：1294 / 1303

页数：10

共 50 条

[21] A wide range no-regret theorem
Lehrer, E
GAMES AND ECONOMIC BEHAVIOR, 2003, 42 (01) : 101 - 115
[22] No-Regret Slice Reservation Algorithms
Monteil, Jean-Baptiste
Iosifidis, George
DaSilva, Luiz
IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2021), 2021,
[23] No-regret dynamics and fictitious play
Viossat, Yannick
Zapechelnyuk, Andriy
JOURNAL OF ECONOMIC THEORY, 2013, 148 (02) : 825 - 842
[24] No-Regret Learning in Bayesian Games
Hartline, Jason
Syrgkanis, Vasilis
Tardos, Eva
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
[25] Strategizing against No-regret Learners
Deng, Yuan
Schneider, Jon
Sivan, Balasubramanian
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[26] Regret of Queueing Bandits
Krishnasamy, Subhashini
Sen, Rajat
Johari, Ramesh
Shakkottai, Sanjay
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[27] On Fixed Convex Combinations of No-Regret Learners
Calliess, Jan-P.
MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, 2009, 5632 : 494 - 504
[28] No-Regret Learning Supports Voters' Competence
Spelda, Petr
Stritecky, Vit
Symons, John
SOCIAL EPISTEMOLOGY, 2024, 38 (05) : 543 - 559
[29] Opportunistic Approachability and Generalized No-Regret Problems
Bernstein, Andrey
Mannor, Shie
Shimkin, Nahum
MATHEMATICS OF OPERATIONS RESEARCH, 2014, 39 (04) : 1057 - 1083
[30] Limits and limitations of no-regret learning in games
Monnot, Barnabe
Piliouras, Georgios
KNOWLEDGE ENGINEERING REVIEW, 2017, 32

← 1 2 3 4 5 →