Nonparametric cluster significance testing with reference to a unimodal null distribution

被引:3
|
作者
Helgeson, Erika S. [1 ]
Vock, David M. [1 ]
Bair, Eric [2 ]
机构
[1] Univ Minnesota, Div Biostat, Minneapolis, MN 55414 USA
[2] Univ N Carolina, Dept Endodont & Biostat, Chapel Hill, NC 27515 USA
基金
美国国家科学基金会;
关键词
cluster analysis; high-dimension low-sample size; hypothesis testing; unimodality; unsupervised learning; STATISTICAL SIGNIFICANCE; COVARIANCE ESTIMATION; DENSITY-ESTIMATION; IDENTIFICATION; MULTIMODALITY; VALIDATION; RELEVANT; NUMBER;
D O I
10.1111/biom.13376
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Cluster analysis is an unsupervised learning strategy that is exceptionally useful for identifying homogeneous subgroups of observations in data sets of unknown structure. However, it is challenging to determine if the identified clusters represent truly distinct subgroups rather than noise. Existing approaches for addressing this problem tend to define clusters based on distributional assumptions, ignore the inherent correlation structure in the data, or are not suited for high-dimension low-sample size (HDLSS) settings. In this paper, we propose a novel method to evaluate the significance of identified clusters by comparing the explained variation due to the clustering from the original data to that produced by clustering a unimodal reference distribution that preserves the covariance structure in the data. The reference distribution is generated using kernel density estimation, and thus, does not require that the data follow a particular distribution. By utilizing sparse covariance estimation, the method is adapted for the HDLSS setting. The approach can be used to test the null hypothesis that the data cannot be partitioned into clusters and to determine the optimal number of clusters. Simulation examples, theoretical evaluations, and applications to temporomandibular disorder research and cancer microarray data illustrate the utility of the proposed method.
引用
收藏
页码:1215 / 1226
页数:12
相关论文
共 50 条
  • [31] SIGNIFICANCE TESTING - APPLIED NONPARAMETRIC STATISTICS - QUALITY-CONTROL
    BENJAMIN, B
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 1975, 138 : 263 - 264
  • [32] A communication researchers' guide to null hypothesis significance testing and alternatives
    Levine, Timothy R.
    Weber, Rene
    Park, Hee Sun
    Hullett, Craig R.
    HUMAN COMMUNICATION RESEARCH, 2008, 34 (02) : 188 - U10
  • [33] The use of null-hypothesis significance testing: issues and solutions
    Gronchi, Giorgio
    Brandi, Maria Luisa
    CLINICAL CASES IN MINERAL AND BONE METABOLISM, 2018, 15 (01) : 9 - 15
  • [34] When Null Hypothesis Significance Testing Is Unsuitable for Research: A Reassessment
    Szucs, Denes
    Ioannidis, John P. A.
    FRONTIERS IN HUMAN NEUROSCIENCE, 2017, 11
  • [35] A Test of the Null Hypothesis Significance Testing Procedure Correlation Argument
    Trafimow, David
    Rice, Stephen
    JOURNAL OF GENERAL PSYCHOLOGY, 2009, 136 (03): : 261 - 269
  • [36] Erratum to: The researcher and the consultant: a dialogue on null hypothesis significance testing
    Andreas Stang
    Charles Poole
    European Journal of Epidemiology, 2014, 29 : 225 - 225
  • [37] Recommendations for statistical analysis involving null hypothesis significance testing
    Harrison, Andrew J.
    McErlain-Naylor, Stuart A.
    Bradshaw, Elizabeth J.
    Dai, Boyi
    Nunome, Hiroyuki
    Hughes, Gerwyn T. G.
    Kong, Pui W.
    Vanwanseele, Benedicte
    Vilas-Boas, J. Paulo
    Fong, Daniel T. P.
    SPORTS BIOMECHANICS, 2020, 19 (05) : 561 - 568
  • [38] The historical case against null-hypothesis significance testing
    Stam, HJ
    Pasay, GA
    BEHAVIORAL AND BRAIN SCIENCES, 1998, 21 (02) : 219 - +
  • [39] The continuing misuse of null hypothesis significance testing in biological anthropology
    Smith, Richard J.
    AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY, 2018, 166 (01) : 236 - 245
  • [40] The Harm Done to Reproducibility by the Culture of Null Hypothesis Significance Testing
    Lash, Timothy L.
    AMERICAN JOURNAL OF EPIDEMIOLOGY, 2017, 186 (06) : 627 - 635