Always Valid Inference: Continuous Monitoring of A/B Tests

被引：17

作者：

Johari, Ramesh ^{[1
]}

Koomen, Pete ^{[2
]}

Pekelis, Leonid ^{[3
]}

Walsh, David ^{[4
]}

机构：

[1] Stanford Univ, Dept Management Sci & Engn, Stanford, CA 94305 USA

[2] Optimizely Inc, San Francisco, CA 94105 USA

[3] CloudTrucks Inc, San Francisco, CA 94103 USA

[4] Unlearn AI, San Francisco, CA 94105 USA

来源：

OPERATIONS RESEARCH | 2021年 / 70卷 / 03期

关键词：

A/B testing; p-values; sequential hypothesis testing; multiple hypothesis testing; confidence intervals; EXPECTED SAMPLE-SIZE;

D O I：

10.1287/opre.2021.2135

中图分类号：

C93 [管理学];

学科分类号：

12 ; 1201 ; 1202 ; 120202 ;

摘要：

A/B tests are typically analyzed via frequentist p-values and confidence intervals, but these inferences are wholly unreliable if users endogenously choose samples sizes by continuously monitoring their tests. We define always valid p-values and confidence intervals that let users try to take advantage of data as fast as it becomes available, providing valid statistical inference whenever they make their decision. Always valid inference can be interpreted as a natural interface for a sequential hypothesis test, which empowers users to implement a modified test tailored to them. In particular, we show in an appropriate sense that the measures we develop trade off sample size and power efficiently, despite a lack of prior knowledge of the user's relative preference between these two goals. We also use always valid p-values to obtain multiple hypothesis testing control in the sequential context. Our methodology has been implemented in a large-scale commercial A/B testing platform to analyze hundreds of thousands of experiments to date. Copyright (C) 2021 The Author(s).

引用

页码：1806 / 1821

页数：17

共 50 条

[1] Can I Take a Peek? Continuous Monitoring of Online A/B Tests
Johari, Ramesh
WWW'15 COMPANION: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2015, : 915 - 915
[2] Is hypoglycemia detected by continuous glucose monitoring clinically valid?
Gross, TM
Veer, AT
Jeng, LM
Bode, BW
Mastrototaro, JM
DIABETES, 2001, 50 : A448 - A448
[3] Why Is a Valid Inference a Good Inference?
Dogramaci, Sinan
PHILOSOPHY AND PHENOMENOLOGICAL RESEARCH, 2017, 94 (01) : 61 - 96
[4] Continuous monitoring of A/B tests without pain: Optional stopping in Bayesian testing
Deng, Alex
Lu, Jiannan
Chen, Shouyuan
PROCEEDINGS OF 3RD IEEE/ACM INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS, (DSAA 2016), 2016, : 243 - 252
[5] THE TORSO, VALID TODAY AS ALWAYS
CLEARY, F
SCULPTURE REVIEW, 1986, 35 (04) : 9 - +
[6] Continuous monitoring of cardiac output: How many assumptions are valid?
Bengur, AR
Meliones, JN
CRITICAL CARE MEDICINE, 2000, 28 (06) : 2168 - 2169
[7] The epistemic significance of valid inference
Dag Prawitz
Synthese, 2012, 187 : 887 - 898
[8] The epistemic significance of valid inference
Prawitz, Dag
SYNTHESE, 2012, 187 (03) : 887 - 898
[9] ENVIRONMENTAL TESTS - ARE THEY VALID
MAYNARD, AW
CHEMTECH, 1990, 20 (03) : 151 - 156
[10] PROJECTIVE TESTS ARE VALID
KARON, BP
AMERICAN PSYCHOLOGIST, 1978, 33 (08) : 764 - 765

← 1 2 3 4 5 →