Improved Regret Bounds for Online Kernel Selection Under Bandit Feedback

被引：0

作者：

Li, Junfan ^{[1
]}

Liao, Shizhong ^{[1
]}

机构：

[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin 300350, Peoples R China

来源：

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT IV | 2023年 / 13716卷

基金：

中国国家自然科学基金;

关键词：

Model selection; Online learning; Bandit; Kernel method;

D O I：

10.1007/978-3-031-26412-2_21

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we improve the regret bound for online kernel selection under bandit feedback. Previous algorithm enjoys a O((parallel to f parallel to(2)(Hi) + 1)K-1/3 T-2/3) expected bound for Lipschitz loss functions. We prove two types of regret bounds improving the previous bound. For smooth loss functions, we propose an algorithm with a O(U-2/3 K-1/3 (Sigma(K)(i=1) L-T (f(i)(*)))(2/3)) expected bound where L-T (f(i)(*)) is the cumulative losses of optimal hypothesis in H-i = {f is an element of H-i : parallel to f parallel to(Hi) <= U}. The data-dependent bound keeps the previous worst-case bound and is smaller if most of candidate kernels match well with the data. For Lipschitz loss functions, we propose an algorithm with a O(U root KT ln(2/3)T) expected bound asymptotically improving the previous bound. We apply the two algorithms to online kernel selection with time constraint and prove new regret bounds matching or improving the previous O(root TlnK + parallel to f parallel to(Hi) max{root T, T/root R}) expected bound where R is the time budget. Finally, we empirically verify our algorithms on online regression and classification tasks.

引用

下载

页码：333 / 348

页数：16

共 50 条

[21] Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems
Jung, Young Hun
Tewari, Ambuj
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[22] An Online Kernel Selection Wrapper via Multi-Armed Bandit Model
Li, Junfan
Liao, Shizhong
2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 1307 - 1312
[23] Regret Analysis for RL using Renewal Bandit Feedback
Bhatt, Sujay
Fang, Guanhua
Li, Ping
Samorodnitsky, Gennady
2022 IEEE INFORMATION THEORY WORKSHOP (ITW), 2022, : 137 - 142
[24] New bounds on the price of bandit feedback for mistake-bounded online multiclass learning
Long, Philip M.
THEORETICAL COMPUTER SCIENCE, 2020, 808 : 159 - 163
[25] Worst-case regret analysis of computationally budgeted online kernel selection
Li, Junfan
Liao, Shizhong
MACHINE LEARNING, 2022, 111 (03) : 937 - 976
[26] Worst-case regret analysis of computationally budgeted online kernel selection
Junfan Li
Shizhong Liao
Machine Learning, 2022, 111 : 937 - 976
[27] Regret and Convergence Bounds for a Class of Continuum-Armed Bandit Problems
Cope, Eric W.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2009, 54 (06) : 1243 - 1253
[28] Regret Bounds for Expected Improvement Algorithms in Gaussian Process Bandit Optimization
Hung Tran-The
Gupta, Sunil
Rana, Santu
Venkatesh, Svetha
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[29] Online Multiclass Boosting with Bandit Feedback
Zhang, Daniel T.
Jung, Young Hun
Tewari, Ambuj
22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
[30] Sharp bounds on the price of bandit feedback for several models of mistake-bounded online learning
Feng, Raymond
Geneson, Jesse
Lee, Andrew
Slettnes, Espen
THEORETICAL COMPUTER SCIENCE, 2023, 965

← 1 2 3 4 5 →