On the Sublinear Regret of GP-UCB

被引:0
|
作者
Whitehouse, Justin [1 ]
Wu, Zhiwei Steven [1 ]
Ramdas, Aaditya [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
BOUNDS; BANDITS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the kernelized bandit problem, a learner aims to sequentially compute the optimum of a function lying in a reproducing kernel Hilbert space given only noisy evaluations at sequentially chosen points. In particular, the learner aims to minimize regret, which is a measure of the suboptimality of the choices made. Arguably the most popular algorithm is the Gaussian Process Upper Confidence Bound (GP-UCB) algorithm, which involves acting based on a simple linear estimator of the unknown function. Despite its popularity, existing analyses of GP-UCB give a suboptimal regret rate, which fails to be sublinear for many commonly used kernels such as the Mat ' ern kernel. This has led to a longstanding open question: are existing regret analyses for GP-UCB tight, or can bounds be improved by using more sophisticated analytical techniques? In this work, we resolve this open question and show that GP-UCB enjoys nearly optimal regret. In particular, our results yield sublinear regret rates for the Mat ' ern kernel, improving over the state-of-the-art analyses and partially resolving a COLT open problem posed by Vakili et al. Our improvements rely on a key technical contribution- regularizing kernel ridge estimators in proportion to the smoothness of the underlying kernel k. Applying this key idea together with a largely overlooked concentration result in separable Hilbert spaces (for which we provide an independent, simplified derivation), we are able to provide a tighter analysis of the GP-UCB algorithm.
引用
收藏
页数:11
相关论文
共 23 条
  • [1] CLUSTERING-GUIDED GP-UCB FOR BAYESIAN OPTIMIZATION
    Kim, Jungtaek
    Choi, Seungjin
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2461 - 2465
  • [2] Improving GP-UCB Algorithm by Harnessing Decomposed Feedback
    Wang, Kai
    Wilder, Bryan
    Suen, Sze-chuan
    Dilkina, Bistra
    Tambe, Milind
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT I, 2020, 1167 : 555 - 569
  • [3] Sublinear regret for learning POMDPs
    Xiong, Yi
    Chen, Ningyuan
    Gao, Xuefeng
    Zhou, Xiang
    PRODUCTION AND OPERATIONS MANAGEMENT, 2022, 31 (09) : 3491 - 3504
  • [4] Logarithmic Regret from Sublinear Hints
    Bhaskara, Aditya
    Cutkosky, Ashok
    Kumar, Ravi
    Purohit, Manish
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [5] Nonstationary Stochastic Bandits: UCB Policies and Minimax Regret
    Wei, Lai
    Srivastava, Vaibhav
    IEEE OPEN JOURNAL OF CONTROL SYSTEMS, 2024, 3 : 128 - 142
  • [6] Feedback graph regret bounds for Thompson Sampling and UCB
    Lykouris, Thodoris
    Tardos, Eva
    Wali, Drishti
    ALGORITHMIC LEARNING THEORY, VOL 117, 2020, 117 : 592 - 614
  • [7] Constrained Online Learning in Networks with Sublinear Regret and Fit
    Paternain, Santiago
    Lee, Soomin
    Zavlanos, Michael M.
    Ribeiro, Alejandro
    2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 5486 - 5493
  • [8] Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits
    Wu, Huasen
    Srikant, R.
    Liu, Xin
    Jiang, Chong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [9] Learn and Control While Switching: Guaranteed Stability and Sublinear Regret
    Chekan, Jafar Abbaszadeh
    Langbort, Cedric
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (12) : 8433 - 8448
  • [10] Safe Policy Search for Lifelong Reinforcement Learning with Sublinear Regret
    Ammar, Haitham Bou
    Tutunov, Rasul
    Eaton, Eric
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 2361 - 2369