Weighted Gaussian Process Bandits for Non-stationary Environments

被引:0
|
作者
Deng, Yuntian [1 ]
Zhou, Xingyu [2 ]
Kim, Baekjin [3 ]
Tewari, Ambuj [3 ]
Gupta, Abhishek [1 ]
Shroff, Ness [1 ]
机构
[1] Ohio State Univ, Columbus, OH 43210 USA
[2] Wayne State Univ, Detroit, MI 48202 USA
[3] Univ Michigan, Ann Arbor, MI 48109 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we consider the Gaussian process (GP) bandit optimization problem in a non-stationary environment. To capture external changes, the black-box function is allowed to be time-varying within a reproducing kernel Hilbert space (RKHS). To this end, we develop WGP-UCB, a novel UCB-type algorithm based on weighted Gaussian process regression. A key challenge is how to cope with infinite-dimensional feature maps. To that end, we leverage kernel approximation techniques to prove a sublinear regret bound, which is the first (frequentist) sublinear regret guarantee on weighted time-varying bandits with general nonlinear rewards. This result generalizes both non-stationary linear bandits and standard GP-UCB algorithms. Further, a novel concentration inequality is achieved for weighted Gaussian process regression with general weights. We also provide universal upper bounds and weight-dependent upper bounds for weighted maximum information gains. These results are of independent interest for applications such as news ranking and adaptive pricing, where weights can be adopted to capture the importance or quality of data. Finally, we conduct experiments to highlight the favorable gains of the proposed algorithm in many cases when compared to existing methods.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Weighted Linear Bandits for Non-Stationary Environments
    Russac, Yoan
    Vernade, Claire
    Cappe, Olivier
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [2] Stochastic Bandits with Graph Feedback in Non-Stationary Environments
    National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing
    210023, China
    不详
    100102, China
    AAAI Conf. Artif. Intell., AAAI, 1600, (8758-8766): : 8758 - 8766
  • [3] Stochastic Bandits with Graph Feedback in Non-Stationary Environments
    Lu, Shiyin
    Hu, Yao
    Zhang, Lijun
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8758 - 8766
  • [4] Non-stationary Bandits with Knapsacks
    Liu, Shang
    Jiang, Jiashuo
    Li, Xiaocheng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [5] Unifying Clustered and Non-stationary Bandits
    Li, Chuanhao
    Wu, Qingyun
    Wang, Hongning
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [6] Non-stationary Bandits with Heavy Tail
    Pan, Weici
    Liu, Zhenhua
    Performance Evaluation Review, 2024, 52 (02): : 33 - 35
  • [7] Fast Algorithm for Non-Stationary Gaussian Process Prediction
    Zhang, Yulai
    Luo, Guiming
    PROCEEDINGS OF THE TWENTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2014, : 3150 - 3151
  • [8] Recursive prediction algorithm for non-stationary Gaussian Process
    Zhang, Yulai
    Luo, Guiming
    JOURNAL OF SYSTEMS AND SOFTWARE, 2017, 127 : 295 - 301
  • [9] Cascading Non-Stationary Bandits: Online Learning to Rank in the Non-Stationary Cascade Model
    Li, Chang
    de Rijke, Maarten
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2859 - 2865
  • [10] A Simple Approach for Non-stationary Linear Bandits
    Zhao, Peng
    Zhang, Lijun
    Jiang, Yuan
    Zhou, Zhi-Hua
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 746 - 754