Stochastic Graphical Bandits with Heavy-Tailed Rewards

被引：0

作者：

Gou, Yutian ^{[1
]}

Yi, Jinfeng ^{[2
]}

Zhang, Lijun ^{[1
]}

机构：

[1] Nanjing Univ, Nat Key Lab Novel Software Technol, Nanjing 210023, Peoples R China

[2] JD AI Res, Beijing 100176, Peoples R China

来源：

UNCERTAINTY IN ARTIFICIAL INTELLIGENCE | 2023年 / 216卷

关键词：

MULTIARMED BANDIT; REGRET;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider stochastic graphical bandits, where after pulling an arm, the decision maker observes rewards of not only the chosen arm but also its neighbors in a feedback graph. Most of existing work assumes that the rewards are drawn from bounded or at least sub-Gaussian distributions, which however may be violated in many practical scenarios such as social advertising and financial markets. To settle this issue, we investigate stochastic graphical bandits with heavy-tailed rewards, where the distributions have finite moments of order 1 + epsilon, for some epsilon is an element of (0; 1]. Firstly, we develop one UCBtype algorithm, whose expected regret is upper bounded by a sum of gap-based quantities over the clique covering of the feedback graph. The key idea is to estimate the reward means of the selected arm's neighbors by more refined robust estimators, and to construct a graph-based upper confidence bound for selecting candidates. Secondly, we design another elimination-based strategy and improve the regret bound to a gap-based sum with size controlled by the independence number of the feedback graph. For benign graphs, the independence number could be smaller than the size of the clique covering, resulting in tighter regret bounds. Finally, we conduct experiments on synthetic data to demonstrate the effectiveness of our methods.

引用

页码：734 / 744

页数：11

共 50 条

[1] Optimal Algorithms for Lipschitz Bandits with Heavy-tailed Rewards
Lu, Shiyin
Wang, Guanghui
Hu, Yao
Zhang, Lijun
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[2] Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed Rewards
Xue, Bo
Wang, Yimu
Wan, Yuanyu
Yi, Jinfeng
Zhang, Lijun
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[3] Minimax Policy for Heavy-Tailed Bandits
Wei, Lai
Srivastava, Vaibhav
[J]. IEEE CONTROL SYSTEMS LETTERS, 2021, 5 (04): : 1423 - 1428
[4] Minimax Policy for Heavy-tailed Bandits
Wei, Lai
Srivastava, Vaibhav
[J]. 2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 1155 - 1160
[5] Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-Tailed Payoffs
Shao, Han
Yu, Xiaotian
King, Irwin
Lyu, Michael R.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[6] Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs
Xue, Bo
Wang, Guanghui
Wang, Yimu
Zhang, Lijun
[J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2936 - 2942
[7] Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards
Lee, Kyungjae
Yang, Hongjun
Lim, Sungbin
Oh, Songhwai
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[8] Robust Heavy-Tailed Linear Bandits Algorithm
Ma, Lanjihong
Zhao, Peng
Zhou, Zhihua
[J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (06): : 1385 - 1395
[9] Graphical Models in Heavy-Tailed Markets
Cardoso, Jose Vinicius de M.
Ying, Jiaxi
Palomar, Daniel P.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[10] Renewal reward processes with heavy-tailed inter-renewal times and heavy-tailed rewards
Levy, JB
Taqqu, MS
[J]. BERNOULLI, 2000, 6 (01) : 23 - 44

← 1 2 3 4 5 →