Partition clustering of high dimensional low sample size data based on p-values

被引:7
|
作者
von Borries, George [2 ]
Wang, Haiyan [1 ]
机构
[1] Kansas State Univ, Dept Stat, Manhattan, KS 66506 USA
[2] Univ Brasilia, Dept Estat, IE, BR-70910900 Brasilia, DF, Brazil
关键词
FALSE DISCOVERY RATE; VARIANCE; NUMBER; ANOVA;
D O I
10.1016/j.csda.2009.06.012
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Clustering techniques play an important role in analyzing high dimensional data that is common in high-throughput screening such as microarray and mass spectrometry data. Effective use of the high dimensionality and some replications can help to increase clustering accuracy and stability. In this article a new partitioning algorithm with a robust distance measure is introduced to cluster variables in high dimensional low sample size (HDLSS) data that contain a large number of independent variables with a small number of replications per variable. The proposed clustering algorithm, PPCLUST, considers data from a mixture distribution and uses p-values from nonparametric rank tests of homogeneous distribution as a measure of similarity to separate the mixture components. PPCLUST is able to efficiently cluster a large number of variables in the presence of very few replications. Inherited from the robustness of rank procedure, the new algorithm is robust to outliers and invariant to monotone transformations of data. Numerical studies and an application to microarray gene expression data for colorectal cancer study are discussed. Published by Elsevier B.V.
引用
收藏
页码:3987 / 3998
页数:12
相关论文
共 50 条
  • [1] p-Values for High-Dimensional Regression
    Meinshausen, Nicolai
    Meier, Lukas
    Buehlmann, Peter
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2009, 104 (488) : 1671 - 1681
  • [2] On Perfect Clustering of High Dimension, Low Sample Size Data
    Sarkar, Soham
    Ghosh, Anil K.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (09) : 2257 - 2272
  • [3] Low P-values exclude nothing, and P-values are no substitute for measures of effect
    Stang, Andreas
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2011, 64 (04) : 452 - 453
  • [4] Fuzzy clustering based classifier for extraction of individualities from high dimension low sample size data
    Sato-Ilic, Mika
    [J]. INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2023, 17 (01): : 127 - 138
  • [5] CLUSTERING HIGH DIMENSION, LOW SAMPLE SIZE DATA USING THE MAXIMAL DATA PILING DISTANCE
    Ahn, Jeongyoun
    Lee, Myung Hee
    Yoon, Young Joo
    [J]. STATISTICA SINICA, 2012, 22 (02) : 443 - 464
  • [6] How the Maximal Evidence of P-Values Against Point Null Hypotheses Depends on Sample Size
    Held, Leonhard
    Ott, Manuela
    [J]. AMERICAN STATISTICIAN, 2016, 70 (04): : 335 - 341
  • [7] P-values based on approximate conditioning and p*
    Lloyd, Chris J.
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2010, 140 (04) : 1073 - 1081
  • [8] Fuzzy partition based soft subspace clustering and its applications in high dimensional data
    Wang, Jun
    Wang, Shitong
    Chung, Fulai
    Deng, Zhaohong
    [J]. INFORMATION SCIENCES, 2013, 246 : 133 - 154
  • [9] Discriminating Tensor Spectral Clustering for High-Dimension-Low-Sample-Size Data
    Hu, Yu
    Qi, Fei
    Cheung, Yiu-Ming
    Cai, Hongmin
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [10] Statistical Significance of Clustering for High-Dimension, Low-Sample Size Data
    Liu, Yufeng
    Hayes, David Neil
    Nobel, Andrew
    Marron, J. S.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2008, 103 (483) : 1281 - 1293