False Discovery in A/B Testing

被引:8
|
作者
Berman, Ron [1 ]
Van den Bulte, Christophe [1 ]
机构
[1] Univ Penn, Wharton Sch, Marketing, Philadelphia, PA 19104 USA
关键词
statistics; design of experiments; decision analysis; inference; A/B testing; false discovery rate; STATISTICAL SIGNIFICANCE; POWER CALCULATIONS; DESIGN;
D O I
10.1287/mnsc.2021.4207
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
We investigate what fraction of all significant results in website A/B testing is actually null effects (i.e., the false discovery rate (FDR)). Our data consist of 4,964 effects from 2,766 experiments conducted on a commercial A/B testing platform. Using three different methods, we find that the FDR ranges between 28% and 37% for tests conducted at 10% significance and between 18% and 25% for tests at 5% significance (two sided). These high FDRs stem mostly from the high fraction of true null effects, about 70%, rather than from low power. Using our estimates, we also assess the potential of various A/B test designs to reduce the FDR. The twomain implications are that decisionmakers should expect one in five interventions achieving significance at 5% confidence to be ineffective when deployed in the field and that analysts should consider using two-stage designs with multiple variations rather than basic A/B tests.
引用
收藏
页码:6762 / 6782
页数:21
相关论文
共 50 条