机构:
Univ Calif Berkeley, Div Biostat, 101 Haviland Hall, Berkeley, CA 94720 USAUniv Calif Berkeley, Div Biostat, 101 Haviland Hall, Berkeley, CA 94720 USA
Gerlovina, Inna
[1
]
van der Laan, Mark J.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif Berkeley, 101 Haviland Hall, Berkeley, CA 94720 USAUniv Calif Berkeley, Div Biostat, 101 Haviland Hall, Berkeley, CA 94720 USA
van der Laan, Mark J.
[2
]
Hubbard, Alan
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif Berkeley, Sch Publ Hlth, Div Biostat, Berkeley, CA 94720 USAUniv Calif Berkeley, Div Biostat, 101 Haviland Hall, Berkeley, CA 94720 USA
Hubbard, Alan
[3
]
机构:
[1] Univ Calif Berkeley, Div Biostat, 101 Haviland Hall, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, 101 Haviland Hall, Berkeley, CA 94720 USA
[3] Univ Calif Berkeley, Sch Publ Hlth, Div Biostat, Berkeley, CA 94720 USA
Multiple comparisons and small sample size, common characteristics of many types of "Big Data" including those that are produced by genomic studies, present specific challenges that affect reliability of inference. Use of multiple testing procedures necessitates calculation of very small tail probabilities of a test statistic distribution. Results based on large deviation theory provide a formal condition that is necessary to guarantee error rate control given practical sample sizes, linking the number of tests and the sample size; this condition, however, is rarely satisfied. Using methods that are based on Edgeworth expansions (relying especially on the work of Peter Hall), we explore the impact of departures of sampling distributions from typical assumptions on actual error rates. Our investigation illustrates how far the actual error rates can be from the declared nominal levels, suggesting potentially wide-spread problems with error rate control, specifically excessive false positives. This is an important factor that contributes to "reproducibility crisis". We also review some other commonly used methods (such as permutation and methods based on finite sampling inequalities) in their application to multiple testing/small sample data. We point out that Edgeworth expansions, providing higher order approximations to the sampling distribution, offer a promising direction for data analysis that could improve reliability of studies relying on large numbers of comparisons with modest sample sizes.
机构:
Eotvos Lorand Univ, Inst Chem, Lab Mol Struct & Dynam, POB 32, H-1518 Budapest, Hungary
MTA ELTE Complex Chem Syst Res Grp, Pazmany Peter Setany 1-A, H-1117 Budapest, HungaryEotvos Lorand Univ, Inst Chem, Lab Mol Struct & Dynam, POB 32, H-1518 Budapest, Hungary
Csaszar, Attila G.
Furtenbacher, Tibor
论文数: 0引用数: 0
h-index: 0
机构:
MTA ELTE Complex Chem Syst Res Grp, Pazmany Peter Setany 1-A, H-1117 Budapest, HungaryEotvos Lorand Univ, Inst Chem, Lab Mol Struct & Dynam, POB 32, H-1518 Budapest, Hungary
Furtenbacher, Tibor
Arendas, Peter
论文数: 0引用数: 0
h-index: 0
机构:
MTA ELTE Complex Chem Syst Res Grp, Pazmany Peter Setany 1-A, H-1117 Budapest, Hungary
Eotvos Lorand Univ, Inst Math, Dept Algebra & Number Theory, POB 120, H-1518 Budapest, HungaryEotvos Lorand Univ, Inst Chem, Lab Mol Struct & Dynam, POB 32, H-1518 Budapest, Hungary
Arendas, Peter
JOURNAL OF PHYSICAL CHEMISTRY A,
2016,
120
(45):
: 8949
-
8969