Robust and sparse correlation matrix estimation for the analysis of high-dimensional genomics data

被引:18
|
作者
Serra, Angela [1 ]
Coretto, Pietro [2 ]
Fratello, Michele [3 ]
Tagliaferri, Roberto [1 ]
机构
[1] Univ Salerno, Dept Management & Innovat Syst, NeuRoNeLab, I-84084 Fisciano, Sa, Italy
[2] Univ Salerno, Dept Econ & Stat, STATLAB, I-84084 Fisciano, Sa, Italy
[3] Second Univ Napoli, Dept Med Surg Neurol Metab & Ageing Sci, Piazza Luigi Miraglia 2, I-80138 Naples, Italy
关键词
GENE-EXPRESSION DATA; CLUSTER-ANALYSIS; COVARIANCE; SELECTION; NUMBER; NOISE;
D O I
10.1093/bioinformatics/btx642
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Microarray technology can be used to study the expression of thousands of genes across a number of different experimental conditions, usually hundreds. The underlying principle is that genes sharing similar expression patterns, across different samples, can be part of the same co-expression system, or they may share the same biological functions. Groups of genes are usually identified based on cluster analysis. Clustering methods rely on the similarity matrix between genes. A common choice to measure similarity is to compute the sample correlation matrix. Dimensionality reduction is another popular data analysis task which is also based on covariance/correlation matrix estimates. Unfortunately, covariance/correlation matrix estimation suffers from the intrinsic noise present in high-dimensional data. Sources of noise are: sampling variations, presents of outlying sample units, and the fact that in most cases the number of units is much larger than the number of genes. Results: In this paper, we propose a robust correlation matrix estimator that is regularized based on adaptive thresholding. The resulting method jointly tames the effects of the high-dimensionality, and data contamination. Computations are easy to implement and do not require hand tunings. Both simulated and real data are analyzed. A Monte Carlo experiment shows that the proposed method is capable of remarkable performances. Our correlation metric is more robust to outliers compared with the existing alternatives in two gene expression datasets. It is also shown how the regularization allows to automatically detect and filter spurious correlations. The same regularization is also extended to other less robust correlation measures. Finally, we apply the ARACNE algorithm on the SyNTreN gene expression data. Sensitivity and specificity of the reconstructed network is compared with the gold standard. We show that ARACNE performs better when it takes the proposed correlation matrix estimator as input.
引用
收藏
页码:625 / 634
页数:10
相关论文
共 50 条
  • [1] Robust sparse precision matrix estimation for high-dimensional compositional data
    Liang, Wanfeng
    Wu, Yue
    Ma, Xiaoyan
    [J]. STATISTICS & PROBABILITY LETTERS, 2022, 184
  • [2] Estimation of high-dimensional sparse cross correlation matrix
    Cao, Yin
    Seo, Kwangok
    Ahn, Soohyun
    Lim, Johan
    [J]. COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS, 2022, 29 (06) : 655 - 664
  • [3] Sparse estimation of high-dimensional correlation matrices
    Cui, Ying
    Leng, Chenlei
    Sun, Defeng
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2016, 93 : 390 - 403
  • [4] Robust estimator of the correlation matrix with sparse Kronecker structure for a high-dimensional matrix-variate
    Niu, Lu
    Liu, Xiumin
    Zhao, Junlong
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2020, 177
  • [5] Robust Covariance Matrix Estimation for High-Dimensional Compositional Data with Application to Sales Data Analysis
    Li, Danning
    Srinivasan, Arun
    Chen, Qian
    Xue, Lingzhou
    [J]. JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 2023, 41 (04) : 1090 - 1100
  • [6] A new robust covariance matrix estimation for high-dimensional microbiome data
    Wang, Jiyang
    Liang, Wanfeng
    Li, Lijie
    Wu, Yue
    Ma, Xiaoyan
    [J]. AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, 2024, 66 (02) : 281 - 295
  • [7] Sparse covariance matrix estimation in high-dimensional deconvolution
    Belomestny, Denis
    Trabs, Mathias
    Tsybakov, Alexandre B.
    [J]. BERNOULLI, 2019, 25 (03) : 1901 - 1938
  • [8] High-dimensional correlation matrix estimation for Gaussian data: a Bayesian perspective
    Wang, Chaojie
    Fan, Xiaodan
    [J]. STATISTICS AND ITS INTERFACE, 2021, 14 (03) : 351 - 358
  • [9] Fast Robust Correlation for High-Dimensional Data
    Raymaekers, Jakob
    Rousseeuw, Peter J.
    [J]. TECHNOMETRICS, 2021, 63 (02) : 184 - 198
  • [10] Robust estimation of a high-dimensional integrated covariance matrix
    Morimoto, Takayuki
    Nagata, Shuichi
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2017, 46 (02) : 1102 - 1112