Is p-value < 0.05 enough? A study on the statistical evaluation of classifiers

被引：2

作者：

Neumann, Nadine M. ^{[1
]}

Plastino, Alexandre ^{[1
]}

Pinto Junior, Jony A. ^{[2
]}

Freitas, Alex A. ^{[3
]}

机构：

[1] Univ Fed Fluminense, Inst Computacao, Niteroi, RJ, Brazil

[2] Univ Fed Fluminense, Dept Estat, Niteroi, RJ, Brazil

[3] Univ Kent, Sch Comp, Canterbury, Kent, England

来源：

KNOWLEDGE ENGINEERING REVIEW | 2020年 / 36卷

关键词：

EFFECT SIZE; CLASSIFICATION;

D O I：

10.1017/S0269888920000417

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Statistical significance analysis, based on hypothesis tests, is a common approach for comparing classifiers. However, many studies oversimplify this analysis by simply checking the condition p-value < 0.05, ignoring important concepts such as the effect size and the statistical power of the test. This problem is so worrying that the American Statistical Association has taken a strong stand on the subject, noting that although the p-value is a useful statistical measure, it has been abusively used and misinterpreted. This work highlights problems caused by the misuse of hypothesis tests and shows how the effect size and the power of the test can provide important information for better decision-making. To investigate these issues, we perform empirical studies with different classifiers and 50 datasets, using the Student's t-test and the Wilcoxon test to compare classifiers. The results show that an isolated p-value analysis can lead to wrong conclusions and that the evaluation of the effect size and the power of the test contributes to a more principled decision-making.

引用

页数：26

共 50 条

[1] Moving to a world beyond p-value < 0.05: a guide for business researchers
Kim, Jae H.
[J]. REVIEW OF MANAGERIAL SCIENCE, 2022, 16 (08) : 2467 - 2493
[2] Why a P-Value is Not Enough
Solla, Federico
Tran, Antoine
Bertoncelli, Domenico
Musoff, Charles
Bertoncelli, Carlo M.
[J]. CLINICAL SPINE SURGERY, 2018, 31 (09): : 385 - 388
[3] p-value <0,05? No, grazie p-value <0.05? No, thanks
Consonni, Dario
[J]. EPIDEMIOLOGIA & PREVENZIONE, 2022, 46 (5-6): : 302 - 302
[4] An observational analysis of the trope "A p-value of <0.05 was considered statistically significant" and other cut-and-paste statistical methods
White, Nicole M.
Balasubramaniam, Thirunavukarasu
Nayak, Richi
Barnett, Adrian G.
[J]. PLOS ONE, 2022, 17 (03):
[5] IN SURVIVAL CURVES IS THE P-VALUE ENOUGH
MARINO, P
[J]. LUNG CANCER, 1995, 12 (1-2) : 87 - 89
[6] The p-value Function and Statistical Inference
Fraser, D. A. S.
[J]. AMERICAN STATISTICIAN, 2019, 73 : 135 - 147
[7] Statistical properties of the fuzzy p-value
Hryniewicz, Olgierd
[J]. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2018, 93 : 544 - 560
[8] Statistical significance: Interpreting the p-value
Koeppel, Maximilian
Eckert, Katharina
[J]. BEWEGUNGSTHERAPIE UND GESUNDHEITSSPORT, 2021, 37 (02): : 72 - 76
[9] Statistical "errors" in biomedical research: The value of the p-value
Panagiotakos, D. B.
Chaimani, A.
Sitara, M.
[J]. ARCHIVES OF HELLENIC MEDICINE, 2010, 27 (01): : 113 - 118
[10] The revolution in statistical decision making: the p-value
Romero Suarez, Nelson
[J]. TELOS-REVISTA DE ESTUDIOS INTERDISCIPLINARIOS EN CIENCIAS SOCIALES, 2012, 14 (03): : 439 - 446

← 1 2 3 4 5 →

Is p-value &lt; 0.05 enough? A study on the statistical evaluation of classifiers

Is p-value < 0.05 enough? A study on the statistical evaluation of classifiers