Weighted Gaussian Process Bandits for Non-stationary Environments

被引:0
|
作者
Deng, Yuntian [1 ]
Zhou, Xingyu [2 ]
Kim, Baekjin [3 ]
Tewari, Ambuj [3 ]
Gupta, Abhishek [1 ]
Shroff, Ness [1 ]
机构
[1] Ohio State Univ, Columbus, OH 43210 USA
[2] Wayne State Univ, Detroit, MI 48202 USA
[3] Univ Michigan, Ann Arbor, MI 48109 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we consider the Gaussian process (GP) bandit optimization problem in a non-stationary environment. To capture external changes, the black-box function is allowed to be time-varying within a reproducing kernel Hilbert space (RKHS). To this end, we develop WGP-UCB, a novel UCB-type algorithm based on weighted Gaussian process regression. A key challenge is how to cope with infinite-dimensional feature maps. To that end, we leverage kernel approximation techniques to prove a sublinear regret bound, which is the first (frequentist) sublinear regret guarantee on weighted time-varying bandits with general nonlinear rewards. This result generalizes both non-stationary linear bandits and standard GP-UCB algorithms. Further, a novel concentration inequality is achieved for weighted Gaussian process regression with general weights. We also provide universal upper bounds and weight-dependent upper bounds for weighted maximum information gains. These results are of independent interest for applications such as news ranking and adaptive pricing, where weights can be adopted to capture the importance or quality of data. Finally, we conduct experiments to highlight the favorable gains of the proposed algorithm in many cases when compared to existing methods.
引用
收藏
页数:24
相关论文
共 50 条
  • [21] Non-stationary Dueling Bandits for Online Learning to Rank
    Lu, Shiyin
    Miao, Yuan
    Yang, Ping
    Hu, Yao
    Zhang, Lijun
    WEB AND BIG DATA, PT II, APWEB-WAIM 2022, 2023, 13422 : 166 - 174
  • [22] Reward Attack on Stochastic Bandits with Non-stationary Rewards
    Yang, Chenye
    Liu, Guanlin
    Lai, Lifeng
    FIFTY-SEVENTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, IEEECONF, 2023, : 1387 - 1393
  • [23] Non-stationary Gaussian signal classification
    Roberts, G
    Zoubir, AM
    Boashash, B
    1996 IEEE TENCON - DIGITAL SIGNAL PROCESSING APPLICATIONS PROCEEDINGS, VOLS 1 AND 2, 1996, : 526 - 529
  • [24] Predictive reinforcement learning in non-stationary environments using weighted mixture policy
    Pourshamsaei, Hossein
    Nobakhti, Amin
    APPLIED SOFT COMPUTING, 2024, 153
  • [25] Stochastic variational inference for scalable non-stationary Gaussian process regression
    Ionut Paun
    Dirk Husmeier
    Colin J. Torney
    Statistics and Computing, 2023, 33
  • [26] Leveraged Non-Stationary Gaussian Process Regression for Autonomous Robot Navigation
    Choi, Sungjoon
    Kim, Eunwoo
    Lee, Kyungjae
    Oh, Songhwai
    2015 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2015, : 473 - 478
  • [27] Stochastic variational inference for scalable non-stationary Gaussian process regression
    Paun, Ionut
    Husmeier, Dirk
    Torney, Colin J.
    STATISTICS AND COMPUTING, 2023, 33 (02)
  • [28] Detection and estimation in non-stationary environments
    Toolan, TM
    Tufts, DW
    CONFERENCE RECORD OF THE THIRTY-SEVENTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2, 2003, : 797 - 801
  • [29] Adaptive beamforming in non-stationary environments
    Cox, H
    THIRTY-SIXTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS - CONFERENCE RECORD, VOLS 1 AND 2, CONFERENCE RECORD, 2002, : 431 - 438
  • [30] Non-Stationary Bandits with Auto-Regressive Temporal Dependency
    Chen, Qinyi
    Golrezaei, Negin
    Bouneffouf, Djallel
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,