Stochastic Graphical Bandits with Heavy-Tailed Rewards

被引:0
|
作者
Gou, Yutian [1 ]
Yi, Jinfeng [2 ]
Zhang, Lijun [1 ]
机构
[1] Nanjing Univ, Nat Key Lab Novel Software Technol, Nanjing 210023, Peoples R China
[2] JD AI Res, Beijing 100176, Peoples R China
来源
关键词
MULTIARMED BANDIT; REGRET;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider stochastic graphical bandits, where after pulling an arm, the decision maker observes rewards of not only the chosen arm but also its neighbors in a feedback graph. Most of existing work assumes that the rewards are drawn from bounded or at least sub-Gaussian distributions, which however may be violated in many practical scenarios such as social advertising and financial markets. To settle this issue, we investigate stochastic graphical bandits with heavy-tailed rewards, where the distributions have finite moments of order 1 + epsilon, for some epsilon is an element of (0; 1]. Firstly, we develop one UCBtype algorithm, whose expected regret is upper bounded by a sum of gap-based quantities over the clique covering of the feedback graph. The key idea is to estimate the reward means of the selected arm's neighbors by more refined robust estimators, and to construct a graph-based upper confidence bound for selecting candidates. Secondly, we design another elimination-based strategy and improve the regret bound to a gap-based sum with size controlled by the independence number of the feedback graph. For benign graphs, the independence number could be smaller than the size of the clique covering, resulting in tighter regret bounds. Finally, we conduct experiments on synthetic data to demonstrate the effectiveness of our methods.
引用
收藏
页码:734 / 744
页数:11
相关论文
共 50 条
  • [1] Optimal Algorithms for Lipschitz Bandits with Heavy-tailed Rewards
    Lu, Shiyin
    Wang, Guanghui
    Hu, Yao
    Zhang, Lijun
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [2] Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed Rewards
    Xue, Bo
    Wang, Yimu
    Wan, Yuanyu
    Yi, Jinfeng
    Zhang, Lijun
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] Minimax Policy for Heavy-Tailed Bandits
    Wei, Lai
    Srivastava, Vaibhav
    [J]. IEEE CONTROL SYSTEMS LETTERS, 2021, 5 (04): : 1423 - 1428
  • [4] Minimax Policy for Heavy-tailed Bandits
    Wei, Lai
    Srivastava, Vaibhav
    [J]. 2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 1155 - 1160
  • [5] Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-Tailed Payoffs
    Shao, Han
    Yu, Xiaotian
    King, Irwin
    Lyu, Michael R.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [6] Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs
    Xue, Bo
    Wang, Guanghui
    Wang, Yimu
    Zhang, Lijun
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2936 - 2942
  • [7] Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards
    Lee, Kyungjae
    Yang, Hongjun
    Lim, Sungbin
    Oh, Songhwai
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [8] Robust Heavy-Tailed Linear Bandits Algorithm
    Ma, Lanjihong
    Zhao, Peng
    Zhou, Zhihua
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (06): : 1385 - 1395
  • [9] Graphical Models in Heavy-Tailed Markets
    Cardoso, Jose Vinicius de M.
    Ying, Jiaxi
    Palomar, Daniel P.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [10] Renewal reward processes with heavy-tailed inter-renewal times and heavy-tailed rewards
    Levy, JB
    Taqqu, MS
    [J]. BERNOULLI, 2000, 6 (01) : 23 - 44