High-Dimensional Heteroscedastic Regression with an Application to eQTL Data Analysis

被引:36
|
作者
Daye, Z. John [1 ]
Chen, Jinbo [1 ]
Li, Hongzhe [1 ]
机构
[1] Univ Penn, Sch Med, Dept Biostat & Epidemiol, Philadelphia, PA 19104 USA
基金
美国国家卫生研究院;
关键词
Generalized least squares; Heteroscedasticity; Large p small n; Model selection; Sparse regression; Variance estimation; VARIABLE SELECTION; SHRINKAGE;
D O I
10.1111/j.1541-0420.2011.01652.x
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
We consider the problem of high-dimensional regression under nonconstant error variances. Despite being a common phenomenon in biological applications, heteroscedasticity has, so far, been largely ignored in high-dimensional analysis of genomic data sets. We propose a new methodology that allows nonconstant error variances for high-dimensional estimation and model selection. Our method incorporates heteroscedasticity by simultaneously modeling both the mean and variance components via a novel doubly regularized approach. Extensive Monte Carlo simulations indicate that our proposed procedure can result in better estimation and variable selection than existing methods when heteroscedasticity arises from the presence of predictors explaining error variances and outliers. Further, we demonstrate the presence of heteroscedasticity in and apply our method to an expression quantitative trait loci (eQTLs) study of 112 yeast segregants. The new procedure can automatically account for heteroscedasticity in identifying the eQTLs that are associated with gene expression variations and lead to smaller prediction errors. These results demonstrate the importance of considering heteroscedasticity in eQTL data analysis.
引用
收藏
页码:316 / 326
页数:11
相关论文
共 50 条
  • [31] The adaptive lasso in high-dimensional sparse heteroscedastic models
    Wagener J.
    Dette H.
    Mathematical Methods of Statistics, 2013, 22 (2) : 137 - 154
  • [32] Shrinkage and LASSO strategies in high-dimensional heteroscedastic models
    Nkurunziza, Severien
    Al-Momani, Marwan
    Lin, Eric Yu Yin
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2016, 45 (15) : 4454 - 4470
  • [33] Nonparametric mean and variance adaptive classification rule for high-dimensional data with heteroscedastic variances
    Oh, Seungyeon
    Park, Hoyoung
    STATISTICAL ANALYSIS AND DATA MINING, 2024, 17 (03)
  • [34] High-dimensional data analysis and visualisation
    Chen, Cathy W. S.
    Lombardo, Rosaria
    Ripamonti, Enrico
    COMPUTATIONAL STATISTICS, 2024, 39 (01) : 1 - 2
  • [35] Procrustes Analysis for High-Dimensional Data
    Andreella, Angela
    Finos, Livio
    PSYCHOMETRIKA, 2022, 87 (04) : 1422 - 1438
  • [36] High-dimensional data analysis and visualisation
    Cathy W. S. Chen
    Rosaria Lombardo
    Enrico Ripamonti
    Computational Statistics, 2024, 39 : 1 - 2
  • [37] Procrustes Analysis for High-Dimensional Data
    Angela Andreella
    Livio Finos
    Psychometrika, 2022, 87 : 1422 - 1438
  • [38] Regression on High-dimensional Inputs
    Kuleshov, Alexander
    Bernstein, Alexander
    2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2016, : 732 - 739
  • [39] On inference in high-dimensional regression
    Battey, Heather S.
    Reid, Nancy
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2023, 85 (01) : 149 - 175
  • [40] Bayesian high-dimensional regression for change point analysis
    Datta, Abhirup
    Zou, Hui
    Banerjee, Sudipto
    STATISTICS AND ITS INTERFACE, 2019, 12 (02) : 253 - 264