High-Dimensional Heteroscedastic Regression with an Application to eQTL Data Analysis

被引:36
|
作者
Daye, Z. John [1 ]
Chen, Jinbo [1 ]
Li, Hongzhe [1 ]
机构
[1] Univ Penn, Sch Med, Dept Biostat & Epidemiol, Philadelphia, PA 19104 USA
基金
美国国家卫生研究院;
关键词
Generalized least squares; Heteroscedasticity; Large p small n; Model selection; Sparse regression; Variance estimation; VARIABLE SELECTION; SHRINKAGE;
D O I
10.1111/j.1541-0420.2011.01652.x
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
We consider the problem of high-dimensional regression under nonconstant error variances. Despite being a common phenomenon in biological applications, heteroscedasticity has, so far, been largely ignored in high-dimensional analysis of genomic data sets. We propose a new methodology that allows nonconstant error variances for high-dimensional estimation and model selection. Our method incorporates heteroscedasticity by simultaneously modeling both the mean and variance components via a novel doubly regularized approach. Extensive Monte Carlo simulations indicate that our proposed procedure can result in better estimation and variable selection than existing methods when heteroscedasticity arises from the presence of predictors explaining error variances and outliers. Further, we demonstrate the presence of heteroscedasticity in and apply our method to an expression quantitative trait loci (eQTLs) study of 112 yeast segregants. The new procedure can automatically account for heteroscedasticity in identifying the eQTLs that are associated with gene expression variations and lead to smaller prediction errors. These results demonstrate the importance of considering heteroscedasticity in eQTL data analysis.
引用
收藏
页码:316 / 326
页数:11
相关论文
共 50 条
  • [1] Variable Selection for High-Dimensional Heteroscedastic Regression and Its Applications
    Peng, Po-Hsiang
    Chiou, Hai-Tang
    Huang, Hsueh-Han
    Ing, Ching-Kang
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2025,
  • [2] Joint High-Dimensional Bayesian Variable and Covariance Selection with an Application to eQTL Analysis
    Bhadra, Anindya
    Mallick, Bani K.
    BIOMETRICS, 2013, 69 (02) : 447 - 457
  • [3] Optimally Weighted PCA for High-Dimensional Heteroscedastic Data
    Hong, David
    Yang, Fan
    Fessler, Jeffrey A.
    Balzano, Laura
    SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2023, 5 (01): : 222 - 250
  • [4] Asymptotic performance of PCA for high-dimensional heteroscedastic data
    Hong, David
    Balzano, Laura
    Fessler, Jeffrey A.
    JOURNAL OF MULTIVARIATE ANALYSIS, 2018, 167 : 435 - 452
  • [5] Support estimation and sign recovery in high-dimensional heteroscedastic mean regression
    Hermann, Philipp
    Holzmann, Hajo
    SCANDINAVIAN JOURNAL OF STATISTICS, 2025,
  • [6] Change-point inference for high-dimensional heteroscedastic data
    Wu, Teng
    Volgushev, Stanislav
    Shao, Xiaofeng
    ELECTRONIC JOURNAL OF STATISTICS, 2023, 17 (02): : 3893 - 3941
  • [7] Robust High-Dimensional Regression with Coefficient Thresholding and Its Application to Imaging Data Analysis
    Liu, Bingyuan
    Zhang, Qi
    Xue, Lingzhou
    Song, Peter X. -K.
    Kang, Jian
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (545) : 715 - 729
  • [8] Variable selection for high-dimensional regression models with time series and heteroscedastic errors
    Chiou, Hai-Tang
    Guo, Meihui
    Ing, Ching-Kang
    JOURNAL OF ECONOMETRICS, 2020, 216 (01) : 118 - 136
  • [9] Factor Analysis Regression for Predictive Modeling with High-Dimensional Data
    Carter, Randy
    Michael, Netsanet
    JOURNAL OF QUANTITATIVE ECONOMICS, 2022, 20 (SUPPL 1) : 115 - 132
  • [10] Factor Analysis Regression for Predictive Modeling with High-Dimensional Data
    Randy Carter
    Netsanet Michael
    Journal of Quantitative Economics, 2022, 20 : 115 - 132