A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity

被引：0

作者：

Xiong, Zhihan ^{[1
]}

Camilleri, Romain ^{[1
]}

Fazel, Maryam ^{[1
]}

Jain, Lalit ^{[1
]}

Jamieson, Kevin ^{[1
]}

机构：

[1] Univ Washington, Seattle, WA 98195 USA

来源：

INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238 | 2024年 / 238卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We investigate the fixed-budget best-arm identification (BAI) problem for linear bandits in a potentially non-stationary environment. Given a finite arm set X subset of R-d, a fixed budget T, and an unpredictable sequence of parameters {theta(t)}(t=1)(T), an algorithm will aim to correctly identify the best arm x* := arg max(x is an element of X) x(inverted perpendicular) Sigma(T)(t=1) theta(t) with probability as high as possible. Prior work has addressed the stationary setting where theta(t) = theta(1) for all t and demonstrated that the error probability decreases as exp(-T/rho*) for a problem-dependent constant rho*. But in many real-world A/B/n multivariate testing scenarios that motivate our work, the environment is non-stationary and an algorithm expecting a stationary setting can easily fail. For robust identification, it is well-known that if arms are chosen randomly and non-adaptively from a G-optimal design over X at each time then the error probability decreases as exp(-T Delta(2)((1))/d), where Delta((1)) = min(x not equal x*)(x*-x)(inverted perpendicular)1/T Sigma(T)(t=1) theta(t). As there exist environments where Delta(2)((1))/d << 1/rho*, we are motivated to propose a novel algorithm P1-RAGE that aims to obtain the best of both worlds: robustness to non-stationarity and fast rates of identification in benign settings. We characterize the error probability of P1-RAGE and demonstrate empirically that the algorithm indeed never performs worse than G-optimal design but compares favorably to the best algorithms in the stationary setting.

引用

页数：24

共 50 条

[1] Best-Arm Identification in Linear Bandits
Soare, Marta
Lazaric, Alessandro
Munos, Remi
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
[2] Optimal Best-arm Identification in Linear Bandits
Jedra, Yassir
Proutiere, Alexandre
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[3] A Non-asymptotic Approach to Best-Arm Identification for Gaussian Bandits
Barrier, Antoine
Garivier, Aurelien
Kocak, Tomas
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[4] Best-Arm Identification in Correlated Multi-Armed Bandits
Gupta S.
Joshi G.
Yagan O.
IEEE Journal on Selected Areas in Information Theory, 2021, 2 (02): : 549 - 563
[5] On Best-Arm Identification with a Fixed Budget in Non-Parametric Multi-Armed Bandits
Barrier, Antoine
Garivier, Aurelien
Stoltz, Gilles
INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 201, 2023, 201 : 136 - 181
[6] Sequential estimation of quantiles with applications to A/B testing and best-arm identification
Howard, Steven R.
Ramdas, Aaditya
BERNOULLI, 2022, 28 (03) : 1704 - 1728
[7] On Sequential Elimination Algorithms for Best-Arm Identification in Multi-Armed Bandits
Shahrampour, Shahin
Noshad, Mohammad
Tarokh, Vahid
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2017, 65 (16) : 4281 - 4292
[8] Best arm identification in generalized linear bandits
Kazerouni, Abbas
Wein, Lawrence M.
OPERATIONS RESEARCH LETTERS, 2021, 49 (03) : 365 - 371
[9] Best-arm Identification Algorithms for Multi-Armed Bandits in the Fixed Confidence Setting
Jamieson, Kevin
Nowak, Robert
2014 48TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2014,
[10] ε-Best-Arm Identification in Pay-Per-Reward Multi-Armed Bandits
Sabato, Sivan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32

← 1 2 3 4 5 →