A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity

被引:0
|
作者
Xiong, Zhihan [1 ]
Camilleri, Romain [1 ]
Fazel, Maryam [1 ]
Jain, Lalit [1 ]
Jamieson, Kevin [1 ]
机构
[1] Univ Washington, Seattle, WA 98195 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate the fixed-budget best-arm identification (BAI) problem for linear bandits in a potentially non-stationary environment. Given a finite arm set X subset of R-d, a fixed budget T, and an unpredictable sequence of parameters {theta(t)}(t=1)(T), an algorithm will aim to correctly identify the best arm x* := arg max(x is an element of X) x(inverted perpendicular) Sigma(T)(t=1) theta(t) with probability as high as possible. Prior work has addressed the stationary setting where theta(t) = theta(1) for all t and demonstrated that the error probability decreases as exp(-T/rho*) for a problem-dependent constant rho*. But in many real-world A/B/n multivariate testing scenarios that motivate our work, the environment is non-stationary and an algorithm expecting a stationary setting can easily fail. For robust identification, it is well-known that if arms are chosen randomly and non-adaptively from a G-optimal design over X at each time then the error probability decreases as exp(-T Delta(2)((1))/d), where Delta((1)) = min(x not equal x*)(x*-x)(inverted perpendicular)1/T Sigma(T)(t=1) theta(t). As there exist environments where Delta(2)((1))/d << 1/rho*, we are motivated to propose a novel algorithm P1-RAGE that aims to obtain the best of both worlds: robustness to non-stationarity and fast rates of identification in benign settings. We characterize the error probability of P1-RAGE and demonstrate empirically that the algorithm indeed never performs worse than G-optimal design but compares favorably to the best algorithms in the stationary setting.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Best-Arm Identification in Linear Bandits
    Soare, Marta
    Lazaric, Alessandro
    Munos, Remi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [2] Optimal Best-arm Identification in Linear Bandits
    Jedra, Yassir
    Proutiere, Alexandre
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [3] A Non-asymptotic Approach to Best-Arm Identification for Gaussian Bandits
    Barrier, Antoine
    Garivier, Aurelien
    Kocak, Tomas
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [4] Best-Arm Identification in Correlated Multi-Armed Bandits
    Gupta S.
    Joshi G.
    Yagan O.
    IEEE Journal on Selected Areas in Information Theory, 2021, 2 (02): : 549 - 563
  • [5] On Best-Arm Identification with a Fixed Budget in Non-Parametric Multi-Armed Bandits
    Barrier, Antoine
    Garivier, Aurelien
    Stoltz, Gilles
    INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 201, 2023, 201 : 136 - 181
  • [6] Sequential estimation of quantiles with applications to A/B testing and best-arm identification
    Howard, Steven R.
    Ramdas, Aaditya
    BERNOULLI, 2022, 28 (03) : 1704 - 1728
  • [7] On Sequential Elimination Algorithms for Best-Arm Identification in Multi-Armed Bandits
    Shahrampour, Shahin
    Noshad, Mohammad
    Tarokh, Vahid
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2017, 65 (16) : 4281 - 4292
  • [8] Best arm identification in generalized linear bandits
    Kazerouni, Abbas
    Wein, Lawrence M.
    OPERATIONS RESEARCH LETTERS, 2021, 49 (03) : 365 - 371
  • [9] Best-arm Identification Algorithms for Multi-Armed Bandits in the Fixed Confidence Setting
    Jamieson, Kevin
    Nowak, Robert
    2014 48TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2014,
  • [10] ε-Best-Arm Identification in Pay-Per-Reward Multi-Armed Bandits
    Sabato, Sivan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32