Parametric and nonparametric two-sample tests for feature screening in class comparison: a simulation study

被引:3
|
作者
Landoni, Elena [1 ]
Ambrogi, Federico [2 ]
Mariani, Luigi [1 ]
Miceli, Rosalba [1 ]
机构
[1] Fdn IRCCS Ist Nazl Tumori, Milan, Italy
[2] Univ Milan, Milan, Italy
关键词
high-dimensional data; class comparison; location-scale problem; general two-sample problem; mixtures;
D O I
10.2427/11808
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Background: The identification of a location-, scale-and shape-sensitive test to detect differentially expressed features between two comparison groups represents a key point in high dimensional studies. The most commonly used tests refer to differences in location, but general distributional discrepancies might be important to reveal differential biological processes. Methods: A simulation study was conducted to compare the performance of a set of two-sample tests, i.e. Student's t, Welch's t, Wilcoxon-Mann-Whitney (WMW), Podgor-Gastwirth PG2, Cucconi, Kolmogorov-Smirnov (KS), Cramer-von Mises (CvM), Anderson-Darling (AD) and Zhang tests (Z(K), Z(C) and Z(A)) which were investigated under different distributional patterns. We applied the same tests to a real data example. Results: AD, CvM, Z(A) and Z(C) tests proved to be the most sensitive tests in mixture distribution patterns, while still maintaining a high power in normal distribution patterns. At best, the AD test showed a power loss of similar to 2% in the comparison of two normal distributions, but a gain of similar to 32% with mixture distributions with respect to the parametric tests. Accordingly, the AD test detected the greatest number of differentially expressed features in the real data application. Conclusion: The tests for the general two-sample problem introduce a more general concept of 'differential expression', thus overcoming the limitations of the other tests restricted to specific moments of the feature distributions. In particular, the AD test should be considered as a powerful alternative to the parametric tests for feature screening in order to keep as many discriminative features as possible for the class prediction analysis.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] A comparison of nonparametric methods for multivariate two-sample tests
    Arboretti, Rosa
    Barzizza, Elena
    Ceccato, Riccardo
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2024,
  • [2] A class of two-sample nonparametric tests for panel count data
    Park, Do-Hwan
    Sun, Jianguo
    Zhao, Xingqiu
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2007, 36 (08) : 1611 - 1625
  • [3] A class of nonparametric tests for the two-sample problem based on order statistics
    Karakaya, Kadir
    Sert, Sumeyra
    Abusaif, Ihab
    Kus, Coskun
    Ng, Hon Keung Tony
    Nagaraja, Haikady N.
    JOURNAL OF NONPARAMETRIC STATISTICS, 2025, 37 (01) : 230 - 263
  • [4] Bayesian and frequentist testing for differences between two groups with parametric and nonparametric two-sample tests
    Kelter, Riko
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2021, 13 (06):
  • [5] Robust nonparametric tests for the two-sample location problem
    Fried, Roland
    Dehling, Herold
    STATISTICAL METHODS AND APPLICATIONS, 2011, 20 (04): : 409 - 422
  • [6] Nonparametric two-sample tests for increasing convex order
    Baringhaus, Ludwig
    Gruebel, Rudolf
    BERNOULLI, 2009, 15 (01) : 99 - 123
  • [7] Robust nonparametric tests for the two-sample location problem
    Roland Fried
    Herold Dehling
    Statistical Methods & Applications, 2011, 20 : 409 - 422
  • [8] A comparative study of nonparametric two-sample tests after Levene's transformation
    Neuhaeuser, Markus
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2007, 77 (06) : 517 - 526
  • [9] On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests
    Ramdas, Aaditya
    Trillos, Nicolas Garcia
    Cuturi, Marco
    ENTROPY, 2017, 19 (02):
  • [10] Local significant differences from nonparametric two-sample tests
    Tarn Duong
    JOURNAL OF NONPARAMETRIC STATISTICS, 2013, 25 (03) : 635 - 645