Is p-value < 0.05 enough? A study on the statistical evaluation of classifiers

被引:2
|
作者
Neumann, Nadine M. [1 ]
Plastino, Alexandre [1 ]
Pinto Junior, Jony A. [2 ]
Freitas, Alex A. [3 ]
机构
[1] Univ Fed Fluminense, Inst Computacao, Niteroi, RJ, Brazil
[2] Univ Fed Fluminense, Dept Estat, Niteroi, RJ, Brazil
[3] Univ Kent, Sch Comp, Canterbury, Kent, England
来源
关键词
EFFECT SIZE; CLASSIFICATION;
D O I
10.1017/S0269888920000417
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Statistical significance analysis, based on hypothesis tests, is a common approach for comparing classifiers. However, many studies oversimplify this analysis by simply checking the condition p-value < 0.05, ignoring important concepts such as the effect size and the statistical power of the test. This problem is so worrying that the American Statistical Association has taken a strong stand on the subject, noting that although the p-value is a useful statistical measure, it has been abusively used and misinterpreted. This work highlights problems caused by the misuse of hypothesis tests and shows how the effect size and the power of the test can provide important information for better decision-making. To investigate these issues, we perform empirical studies with different classifiers and 50 datasets, using the Student's t-test and the Wilcoxon test to compare classifiers. The results show that an isolated p-value analysis can lead to wrong conclusions and that the evaluation of the effect size and the power of the test contributes to a more principled decision-making.
引用
收藏
页数:26
相关论文
共 50 条
  • [1] Moving to a world beyond p-value &lt; 0.05: a guide for business researchers
    Kim, Jae H.
    [J]. REVIEW OF MANAGERIAL SCIENCE, 2022, 16 (08) : 2467 - 2493
  • [2] Why a P-Value is Not Enough
    Solla, Federico
    Tran, Antoine
    Bertoncelli, Domenico
    Musoff, Charles
    Bertoncelli, Carlo M.
    [J]. CLINICAL SPINE SURGERY, 2018, 31 (09): : 385 - 388
  • [3] p-value <0,05? No, grazie p-value <0.05? No, thanks
    Consonni, Dario
    [J]. EPIDEMIOLOGIA & PREVENZIONE, 2022, 46 (5-6): : 302 - 302
  • [4] An observational analysis of the trope "A p-value of &lt;0.05 was considered statistically significant" and other cut-and-paste statistical methods
    White, Nicole M.
    Balasubramaniam, Thirunavukarasu
    Nayak, Richi
    Barnett, Adrian G.
    [J]. PLOS ONE, 2022, 17 (03):
  • [5] IN SURVIVAL CURVES IS THE P-VALUE ENOUGH
    MARINO, P
    [J]. LUNG CANCER, 1995, 12 (1-2) : 87 - 89
  • [6] The p-value Function and Statistical Inference
    Fraser, D. A. S.
    [J]. AMERICAN STATISTICIAN, 2019, 73 : 135 - 147
  • [7] Statistical properties of the fuzzy p-value
    Hryniewicz, Olgierd
    [J]. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2018, 93 : 544 - 560
  • [8] Statistical significance: Interpreting the p-value
    Koeppel, Maximilian
    Eckert, Katharina
    [J]. BEWEGUNGSTHERAPIE UND GESUNDHEITSSPORT, 2021, 37 (02): : 72 - 76
  • [9] Statistical "errors" in biomedical research: The value of the p-value
    Panagiotakos, D. B.
    Chaimani, A.
    Sitara, M.
    [J]. ARCHIVES OF HELLENIC MEDICINE, 2010, 27 (01): : 113 - 118
  • [10] The revolution in statistical decision making: the p-value
    Romero Suarez, Nelson
    [J]. TELOS-REVISTA DE ESTUDIOS INTERDISCIPLINARIOS EN CIENCIAS SOCIALES, 2012, 14 (03): : 439 - 446