No-Regret Linear Bandits beyond Realizability

被引：0

作者：

Liu, Chong ^{[1
]}

Yin, Ming ^{[1
]}

Wang, Yu-Xiang ^{[1
]}

机构：

[1] Univ Calif Santa Barbara, Dept Comp Sci, Santa Barbara, CA 93106 USA

来源：

UNCERTAINTY IN ARTIFICIAL INTELLIGENCE | 2023年 / 216卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study linear bandits when the underlying reward function is not linear. Existing work relies on a uniform misspecification parameter epsilon that measures the sup-norm error of the best linear approximation. This results in an unavoidable linear regret whenever epsilon > 0. We describe a more natural model of misspecification which only requires the approximation error at each input x to be proportional to the suboptimality gap at x. It captures the intuition that, for optimization problems, near-optimal regions should matter more and we can tolerate larger approximation errors in suboptimal regions. Quite surprisingly, we show that the classical Lin-UCB algorithm - designed for the realizable case - is automatically robust against such gap-adjusted misspecification. It achieves a near-optimal root T regret for problems that the best-known regret is almost linear in time horizon T. Technically, our proof relies on a novel self-bounding argument that bounds the part of the regret due to misspecification by the regret itself.

引用

页码：1294 / 1303

页数：10

共 50 条

[1] No-Regret Algorithms for Heavy-Tailed Linear Bandits
Medina, Andres Munoz
Yang, Scott
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[2] An α-No-Regret Algorithm For Graphical Bilinear Bandits
Rizk, Geovani
Colin, Igor
Thomas, Albert
Laraki, Rida
Chevaleyre, Yann
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[3] No-regret learning for repeated concave games with lossy bandits
Liu, Wenting
Lei, Jinlong
Yi, Peng
2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 936 - 941
[4] Tractable contextual bandits beyond realizability
Krishnamurthy, Sanath Kumar
Hadad, Vitor
Athey, Susan
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[5] No-regret learning for repeated non-cooperative games with lossy bandits
Liu, Wenting
Lei, Jinlong
Yi, Peng
Hong, Yiguang
AUTOMATICA, 2024, 160
[6] Mechanisms for a No-Regret Agent: Beyond the Common Prior
Camara, Modibo K.
Hartline, Jason D.
Johnsen, Aleck
2020 IEEE 61ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS 2020), 2020, : 259 - 270
[7] Nash Regret Guarantees for Linear Bandits
Sawarni, Ayush
Pal, Soumyabrata
Barman, Siddharth
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[8] Memory-Constrained No-Regret Learning in Adversarial Multi-Armed Bandits
Xu, Xiao
Zhao, Qing
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 : 2371 - 2382
[9] Experimental Design for Regret Minimization in Linear Bandits
Wagenmaker, Andrew
Katz-Samuels, Julian
Jamieson, Kevin
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[10] No-regret boosting
Gambin, Anna
Szczurek, Ewa
ADAPTIVE AND NATURAL COMPUTING ALGORITHMS, PT 1, 2007, 4431 : 422 - +

← 1 2 3 4 5 →