Weighted Gaussian Process Bandits for Non-stationary Environments

被引:0
|
作者
Deng, Yuntian [1 ]
Zhou, Xingyu [2 ]
Kim, Baekjin [3 ]
Tewari, Ambuj [3 ]
Gupta, Abhishek [1 ]
Shroff, Ness [1 ]
机构
[1] Ohio State Univ, Columbus, OH 43210 USA
[2] Wayne State Univ, Detroit, MI 48202 USA
[3] Univ Michigan, Ann Arbor, MI 48109 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we consider the Gaussian process (GP) bandit optimization problem in a non-stationary environment. To capture external changes, the black-box function is allowed to be time-varying within a reproducing kernel Hilbert space (RKHS). To this end, we develop WGP-UCB, a novel UCB-type algorithm based on weighted Gaussian process regression. A key challenge is how to cope with infinite-dimensional feature maps. To that end, we leverage kernel approximation techniques to prove a sublinear regret bound, which is the first (frequentist) sublinear regret guarantee on weighted time-varying bandits with general nonlinear rewards. This result generalizes both non-stationary linear bandits and standard GP-UCB algorithms. Further, a novel concentration inequality is achieved for weighted Gaussian process regression with general weights. We also provide universal upper bounds and weight-dependent upper bounds for weighted maximum information gains. These results are of independent interest for applications such as news ranking and adaptive pricing, where weights can be adopted to capture the importance or quality of data. Finally, we conduct experiments to highlight the favorable gains of the proposed algorithm in many cases when compared to existing methods.
引用
收藏
页数:24
相关论文
共 50 条
  • [11] Learning Contextual Bandits in a Non-stationary Environment
    Wu, Qingyun
    Iyer, Naveen
    Wang, Hongning
    ACM/SIGIR PROCEEDINGS 2018, 2018, : 495 - 504
  • [12] Competing Bandits in Non-Stationary Matching Markets
    Ghosh, Avishek
    Sankararaman, Abishek
    Ramchandran, Kannan
    Javidi, Tara
    Mazumdar, Arya
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2024, 70 (04) : 2831 - 2850
  • [13] Scalable Gaussian Process Separation for Kernels with a Non-Stationary Phase
    Grasshoff, Jan
    Jankowski, Alexandra
    Rostalski, Philipp
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [14] Scalable Gaussian Process Separation for Kernels with a Non-Stationary Phase
    Grasshoff, Jan
    Jankowski, Alexandra
    Rostalski, Philipp
    25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [15] Non-Stationary Gaussian Process Regression with Hamiltonian Monte Carlo
    Heinonen, Markus
    Mannerstrom, Henrik
    Rousu, Juho
    Kaski, Samuel
    Lahdesmaki, Harri
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 732 - 740
  • [16] Time-Decaying Bandits for Non-stationary Systems
    Komiyama, Junpei, 1600, Springer Verlag (8877):
  • [17] Time-Decaying Bandits for Non-stationary Systems
    Komiyama, Junpei
    Qin, Tao
    WEB AND INTERNET ECONOMICS, 2014, 8877 : 460 - 466
  • [18] Beam Alignment for mmWave Using Non-Stationary Bandits
    Gupta, Ruchir
    Lakshmanan, K.
    Sah, Abhay Kumar
    IEEE COMMUNICATIONS LETTERS, 2020, 24 (11) : 2619 - 2622
  • [19] Non-Stationary Representation Learning in Sequential Linear Bandits
    Qin, Yuzhen
    Menara, Tommaso
    Oymak, Samet
    Ching, Shinung
    Pasqualetti, Fabio
    IEEE OPEN JOURNAL OF CONTROL SYSTEMS, 2022, 1 : 41 - 56
  • [20] Randomized Exploration for Non-Stationary Stochastic Linear Bandits
    Kim, Baekjin
    Tewari, Ambuj
    CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI 2020), 2020, 124 : 71 - 80