A new robust covariance matrix estimation for high-dimensional microbiome data

被引:0
|
作者
Wang, Jiyang [1 ,2 ]
Liang, Wanfeng [3 ]
Li, Lijie [1 ]
Wu, Yue [1 ]
Ma, Xiaoyan [4 ]
机构
[1] Nankai Univ, Sch Stat & Data Sci, Tianjin 300071, Peoples R China
[2] Xinjiang Univ, Coll Math & Syst Sci, Urumqi 830046, Xinjiang, Peoples R China
[3] Dongbei Univ Finance & Econ, Sch Data Sci & Artificial Intelligence, Dalian 116025, Liaoning, Peoples R China
[4] Ningxia Univ, Sch Math & Stat, Yinchuan 750021, Ningxia, Peoples R China
基金
中国国家自然科学基金;
关键词
centred log-ratio; covariance matrix; high dimension; microbiome data; robustness; thresholding; OPTIMAL RATES; COMPOSITIONAL DATA; GUT MICROBIOME; CONVERGENCE; PATTERNS; OBESITY;
D O I
10.1111/anzs.12415
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Microbiome data typically lie in a high-dimensional simplex. One of the key questions in metagenomic analysis is to exploit the covariance structure for this kind of data. In this paper, a framework called approximate-estimate-threshold (AET) is developed for the robust basis covariance estimation for high-dimensional microbiome data. To be specific, we first construct a proxy matrix Gamma$$ \boldsymbol{\Gamma} $$, which is almost indistinguishable from the real basis covariance matrix & sum;$$ \boldsymbol{\Sigma} $$. Then, any estimator Gamma<^>$$ \hat{\boldsymbol{\Gamma}} $$ satisfying some conditions can be used to estimate Gamma$$ \boldsymbol{\Gamma} $$. Finally, we impose a thresholding step on Gamma<^>$$ \hat{\boldsymbol{\Gamma}} $$ to obtain the final estimator & sum;<^>$$ \hat{\boldsymbol{\Sigma}} $$. In particular, this paper applies a Huber-type estimator Gamma<^>$$ \hat{\boldsymbol{\Gamma}} $$, and achieves robustness by only requiring the boundedness of 2+& varepsilon;$$ \epsilon $$ moments for some & varepsilon;is an element of(0,2]$$ \epsilon \in \left(0,2\right] $$. We derive the convergence rate of & sum;<^>$$ \hat{\boldsymbol{\Sigma}} $$ under the spectral norm, and provide theoretical guarantees on support recovery. Extensive simulations and a real example are used to illustrate the empirical performance of our method.
引用
收藏
页码:281 / 295
页数:15
相关论文
共 50 条
  • [21] Regularization for high-dimensional covariance matrix
    Cui, Xiangzhao
    Li, Chun
    Zhao, Jine
    Zeng, Li
    Zhang, Defei
    Pan, Jianxin
    [J]. SPECIAL MATRICES, 2016, 4 (01): : 189 - 201
  • [22] Estimation and optimal structure selection of high-dimensional Toeplitz covariance matrix
    Yang, Yihe
    Zhou, Jie
    Pan, Jianxin
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2021, 184
  • [23] On Coupling Robust Estimation with Regularization for High-Dimensional Data
    Kalina, Jan
    Hlinka, Jaroslav
    [J]. DATA SCIENCE: INNOVATIVE DEVELOPMENTS IN DATA ANALYSIS AND CLUSTERING, 2017, : 15 - 27
  • [24] Simultaneous testing of mean vector and covariance matrix for high-dimensional data
    Liu, Zhongying
    Liu, Baisen
    Zheng, Shurong
    Shi, Ning-Zhong
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2017, 188 : 82 - 93
  • [25] Estimation of high-dimensional integrated covariance matrix based on noisy high-frequency data with multiple observations
    Wang, Moming
    Xia, Ningning
    [J]. STATISTICS & PROBABILITY LETTERS, 2021, 170
  • [26] ROBUST ESTIMATION OF THE MEAN AND COVARIANCE MATRIX FOR HIGH DIMENSIONAL TIME SERIES
    Zhang, Danna
    [J]. STATISTICA SINICA, 2021, 31 (02) : 797 - 820
  • [27] Testing identity of high-dimensional covariance matrix
    Wang, Hao
    Liu, Baisen
    Shi, Ning-Zhong
    Zheng, Shurong
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2018, 88 (13) : 2600 - 2611
  • [28] Adaptive banding covariance estimation for high-dimensional multivariate longitudinal data
    Qian, Fang
    Zhang, Weiping
    Chen, Yu
    [J]. CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2021, 49 (03): : 906 - 938
  • [29] Ridge estimation of inverse covariance matrices from high-dimensional data
    van Wieringen, Wessel N.
    Peeters, Carel F. W.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2016, 103 : 284 - 303
  • [30] Variable selection in multivariate linear models with high-dimensional covariance matrix estimation
    Perrot-Dockes, Marie
    Levy-Leduc, Celine
    Sansonnet, Laure
    Chiquet, Julien
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2018, 166 : 78 - 97