For high-dimensional hierarchical models, consider exchangeability of effects across covariates instead of across datasets

被引:0
|
作者
Trippe, Brian L. [1 ]
Finucane, Hilary K. [2 ]
Broderick, Tamara [1 ]
机构
[1] MIT, CSAIL, Cambridge, MA 02139 USA
[2] Broad Inst, Cambridge, MA USA
关键词
SEEMINGLY UNRELATED REGRESSION; EMPIRICAL BAYES ESTIMATORS; VARIABLE SELECTION; PREDICTION; DISEASES; RISK;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hierarchical Bayesian methods enable information sharing across regression problems on multiple groups of data. While standard practice is to model regression parameters (effects) as (1) exchangeable across the groups and (2) correlated to differing degrees across covariates, we show that this approach exhibits poor statistical performance when the number of covariates exceeds the number of groups. For instance, in statistical genetics, we might regress dozens of traits (defining groups) for thousands of individuals (responses) on up to millions of genetic variants (covariates). When an analyst has more covariates than groups, we argue that it is often preferable to instead model effects as (1) exchangeable across covariates and (2) correlated to differing degrees across groups. To this end, we propose a hierarchical model expressing our alternative perspective. We devise an empirical Bayes estimator for learning the degree of correlation between groups. We develop theory that demonstrates that our method outperforms the classic approach when the number of covariates dominates the number of groups, and corroborate this result empirically on several high-dimensional multiple regression and classification problems.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] High-Dimensional Gaussian Graphical Regression Models with Covariates
    Zhang, Jingfei
    Li, Yi
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023, 118 (543) : 2088 - 2100
  • [2] Forward regression for Cox models with high-dimensional covariates
    Hong, Hyokyoung G.
    Zheng, Qi
    Li, Yi
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2019, 173 : 268 - 290
  • [3] Sampling hyperparameters in hierarchical models: Improving on Gibbs for high-dimensional latent fields and large datasets
    Norton, Richard A.
    Christen, J. Andres
    Fox, Colin
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2018, 47 (09) : 2639 - 2655
  • [4] Additive risk models for survival data with high-dimensional covariates
    Ma, S
    Kosorok, MR
    Fine, JP
    [J]. BIOMETRICS, 2006, 62 (01) : 202 - 210
  • [5] GENERALIZED ADDITIVE PARTIAL LINEAR MODELS WITH HIGH-DIMENSIONAL COVARIATES
    Lian, Heng
    Liang, Hua
    [J]. ECONOMETRIC THEORY, 2013, 29 (06) : 1136 - 1161
  • [6] Ensembled sparse-input hierarchical networks for high-dimensional datasets
    Feng, Jean
    Simon, Noah
    [J]. STATISTICAL ANALYSIS AND DATA MINING, 2022, 15 (06) : 736 - 750
  • [7] A consistent variable selection criterion for linear models with high-dimensional covariates
    Zheng, XD
    Loh, WY
    [J]. STATISTICA SINICA, 1997, 7 (02) : 311 - 325
  • [8] A hierarchical structure of extreme learning machine (HELM) for high-dimensional datasets with noise
    He, Yan-Lin
    Geng, Zhi-Qiang
    Xu, Yuan
    Zhu, Qun-Xiong
    [J]. NEUROCOMPUTING, 2014, 128 : 407 - 414
  • [9] Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models
    Binder, Harald
    Schumacher, Martin
    [J]. BMC BIOINFORMATICS, 2008, 9 (1)
  • [10] A lack-of-fit test for quantile regression models with high-dimensional covariates
    Conde-Amboage, Mercedes
    Sanchez-Sellero, Cesar
    Gonzalez-Manteiga, Wenceslao
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2015, 88 : 128 - 138