Always Valid Inference: Continuous Monitoring of A/B Tests

被引:17
|
作者
Johari, Ramesh [1 ]
Koomen, Pete [2 ]
Pekelis, Leonid [3 ]
Walsh, David [4 ]
机构
[1] Stanford Univ, Dept Management Sci & Engn, Stanford, CA 94305 USA
[2] Optimizely Inc, San Francisco, CA 94105 USA
[3] CloudTrucks Inc, San Francisco, CA 94103 USA
[4] Unlearn AI, San Francisco, CA 94105 USA
关键词
A/B testing; p-values; sequential hypothesis testing; multiple hypothesis testing; confidence intervals; EXPECTED SAMPLE-SIZE;
D O I
10.1287/opre.2021.2135
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
A/B tests are typically analyzed via frequentist p-values and confidence intervals, but these inferences are wholly unreliable if users endogenously choose samples sizes by continuously monitoring their tests. We define always valid p-values and confidence intervals that let users try to take advantage of data as fast as it becomes available, providing valid statistical inference whenever they make their decision. Always valid inference can be interpreted as a natural interface for a sequential hypothesis test, which empowers users to implement a modified test tailored to them. In particular, we show in an appropriate sense that the measures we develop trade off sample size and power efficiently, despite a lack of prior knowledge of the user's relative preference between these two goals. We also use always valid p-values to obtain multiple hypothesis testing control in the sequential context. Our methodology has been implemented in a large-scale commercial A/B testing platform to analyze hundreds of thousands of experiments to date. Copyright (C) 2021 The Author(s).
引用
收藏
页码:1806 / 1821
页数:17
相关论文
共 50 条
  • [41] Are the Common Statistics Used in the Bariatric Surgery Always Valid to Be Relied on?
    Sara Saeidi
    Mehdi Jabbari Nooghabi
    Ali Jangjoo
    Amin Dalili
    Obesity Surgery, 2023, 33 : 1943 - 1943
  • [42] Continuum regression is not always continuous
    Bjorkstrom, A
    Sundberg, R
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1996, 58 (04): : 703 - 710
  • [43] Are the Common Statistics Used in the Bariatric Surgery Always Valid to Be Relied on?
    Saeidi, Sara
    Nooghabi, Mehdi Jabbari
    Jangjoo, Ali
    Dalili, Amin
    OBESITY SURGERY, 2023, 33 (06) : 1943 - 1943
  • [44] Two valid and reliable tests for monitoring age-related memory performance and neophobia differences in dogs
    Piotti, Patrizia
    Piseddu, Andrea
    Aguzzoli, Enrica
    Sommese, Andrea
    Kubinyi, Eniko
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [45] Two valid and reliable tests for monitoring age-related memory performance and neophobia differences in dogs
    Patrizia Piotti
    Andrea Piseddu
    Enrica Aguzzoli
    Andrea Sommese
    Eniko Kubinyi
    Scientific Reports, 12
  • [46] ASYMPTOTIC NORMALITY AND VALID INFERENCE FOR GAUSSIAN VARIATIONAL APPROXIMATION
    Hall, Peter
    Tung Pham
    Wand, M. P.
    Wang, S. S. J.
    ANNALS OF STATISTICS, 2011, 39 (05): : 2502 - 2532
  • [47] How to Obtain Valid Inference under Unit Nonresponse?
    Boeschoten, Laura
    Vink, Gerko
    Hox, Joop J. C. M.
    JOURNAL OF OFFICIAL STATISTICS, 2017, 33 (04) : 963 - 978
  • [48] Valid confidence intervals and inference in the presence of weak instruments
    Zivot, E
    Startz, R
    Nelson, CR
    INTERNATIONAL ECONOMIC REVIEW, 1998, 39 (04) : 1119 - 1144
  • [49] Valid inference in random effects meta-analysis
    Follmann, DA
    Proschan, MA
    BIOMETRICS, 1999, 55 (03) : 732 - 737
  • [50] The Seeming Interdependence Between the Concepts of Valid Inference and Proof
    Dag Prawitz
    Topoi, 2019, 38 : 493 - 503