Improved Regret Bounds for Online Kernel Selection Under Bandit Feedback

被引:0
|
作者
Li, Junfan [1 ]
Liao, Shizhong [1 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin 300350, Peoples R China
基金
中国国家自然科学基金;
关键词
Model selection; Online learning; Bandit; Kernel method;
D O I
10.1007/978-3-031-26412-2_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we improve the regret bound for online kernel selection under bandit feedback. Previous algorithm enjoys a O((parallel to f parallel to(2)(Hi) + 1)K-1/3 T-2/3) expected bound for Lipschitz loss functions. We prove two types of regret bounds improving the previous bound. For smooth loss functions, we propose an algorithm with a O(U-2/3 K-1/3 (Sigma(K)(i=1) L-T (f(i)(*)))(2/3)) expected bound where L-T (f(i)(*)) is the cumulative losses of optimal hypothesis in H-i = {f is an element of H-i : parallel to f parallel to(Hi) <= U}. The data-dependent bound keeps the previous worst-case bound and is smaller if most of candidate kernels match well with the data. For Lipschitz loss functions, we propose an algorithm with a O(U root KT ln(2/3)T) expected bound asymptotically improving the previous bound. We apply the two algorithms to online kernel selection with time constraint and prove new regret bounds matching or improving the previous O(root TlnK + parallel to f parallel to(Hi) max{root T, T/root R}) expected bound where R is the time budget. Finally, we empirically verify our algorithms on online regression and classification tasks.
引用
下载
收藏
页码:333 / 348
页数:16
相关论文
共 50 条
  • [21] Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems
    Jung, Young Hun
    Tewari, Ambuj
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [22] An Online Kernel Selection Wrapper via Multi-Armed Bandit Model
    Li, Junfan
    Liao, Shizhong
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 1307 - 1312
  • [23] Regret Analysis for RL using Renewal Bandit Feedback
    Bhatt, Sujay
    Fang, Guanhua
    Li, Ping
    Samorodnitsky, Gennady
    2022 IEEE INFORMATION THEORY WORKSHOP (ITW), 2022, : 137 - 142
  • [24] New bounds on the price of bandit feedback for mistake-bounded online multiclass learning
    Long, Philip M.
    THEORETICAL COMPUTER SCIENCE, 2020, 808 : 159 - 163
  • [25] Worst-case regret analysis of computationally budgeted online kernel selection
    Li, Junfan
    Liao, Shizhong
    MACHINE LEARNING, 2022, 111 (03) : 937 - 976
  • [26] Worst-case regret analysis of computationally budgeted online kernel selection
    Junfan Li
    Shizhong Liao
    Machine Learning, 2022, 111 : 937 - 976
  • [27] Regret and Convergence Bounds for a Class of Continuum-Armed Bandit Problems
    Cope, Eric W.
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2009, 54 (06) : 1243 - 1253
  • [28] Regret Bounds for Expected Improvement Algorithms in Gaussian Process Bandit Optimization
    Hung Tran-The
    Gupta, Sunil
    Rana, Santu
    Venkatesh, Svetha
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [29] Online Multiclass Boosting with Bandit Feedback
    Zhang, Daniel T.
    Jung, Young Hun
    Tewari, Ambuj
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [30] Sharp bounds on the price of bandit feedback for several models of mistake-bounded online learning
    Feng, Raymond
    Geneson, Jesse
    Lee, Andrew
    Slettnes, Espen
    THEORETICAL COMPUTER SCIENCE, 2023, 965