Group spike-and-slab lasso generalized linear models for disease prediction and associated genes detection by incorporating pathway information

被引:17
|
作者
Tang, Zaixiang [1 ,2 ,3 ,4 ]
Shen, Yueping [1 ,2 ]
Li, Yan [4 ]
Zhang, Xinyan [4 ]
Wen, Jia [5 ]
Qian, Chen'ao [6 ]
Zhuang, Wenzhuo [7 ]
Shi, Xinghua [5 ]
Yi, Nengjun [4 ]
机构
[1] Soochow Univ, Med Coll, Sch Publ Hlth, Dept Biostat, Suzhou 215123, Peoples R China
[2] Soochow Univ, Med Coll, Jiangsu Key Lab Prevent & Translat Med Geriatr Di, Suzhou 215123, Peoples R China
[3] Soochow Univ, Med Coll, Ctr Genet Epidemiol & Genom, Suzhou 215123, Peoples R China
[4] Univ Alabama Birmingham, Dept Biostat, Birmingham, AL 35294 USA
[5] Univ North Carolina Charlotte, Dept Bioinformat & Genom, Charlotte, NC 28223 USA
[6] Soochow Univ, Sch Biol & Basic Med Sci, Dept Bioinformat, Suzhou 215123, Peoples R China
[7] Soochow Univ, Sch Biol & Basic Med Sci, Dept Cell Biol, Suzhou 215123, Peoples R China
基金
美国国家科学基金会; 中国国家自然科学基金; 美国国家卫生研究院;
关键词
VARIABLE SELECTION; REGULARIZATION PATHS; REGRESSION;
D O I
10.1093/bioinformatics/btx684
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Large-scale molecular data have been increasingly used as an important resource for prognostic prediction of diseases and detection of associated genes. However, standard approaches for omics data analysis ignore the group structure among genes encoded in functional relationships or pathway information. Results: We propose new Bayesian hierarchical generalized linear models, called group spike-and-slab lasso GLMs, for predicting disease outcomes and detecting associated genes by incorporating large-scale molecular data and group structures. The proposed model employs a mixture double-exponential prior for coefficients that induces self-adaptive shrinkage amount on different coefficients. The group information is incorporated into the model by setting group-specific parameters. We have developed a fast and stable deterministic algorithm to fit the proposed hierarchal GLMs, which can perform variable selection within groups. We assess the performance of the proposed method on several simulated scenarios, by varying the overlap among groups, group size, number of non-null groups, and the correlation within group. Compared with existing methods, the proposed method provides not only more accurate estimates of the parameters but also better prediction. We further demonstrate the application of the proposed procedure on three cancer datasets by utilizing pathway structures of genes. Our results show that the proposed method generates powerful models for predicting disease outcomes and detecting associated genes.
引用
收藏
页码:901 / 910
页数:10
相关论文
共 9 条
  • [1] The Spike-and-Slab Lasso Generalized Linear Models for Prediction and Associated Genes Detection
    Tang, Zaixiang
    Shen, Yueping
    Zhang, Xinyan
    Yi, Nengjun
    [J]. GENETICS, 2017, 205 (01) : 77 - +
  • [2] The spike-and-slab lasso Cox model for survival prediction and associated genes detection
    Tang, Zaixiang
    Shen, Yueping
    Zhang, Xinyan
    Yi, Nengjun
    [J]. BIOINFORMATICS, 2017, 33 (18) : 2799 - 2807
  • [3] Spike-and-Slab Group Lassos for Grouped Regression and Sparse Generalized Additive Models
    Bai, Ray
    Moran, Gemma E.
    Antonelli, Joseph L.
    Chen, Yong
    Boland, Mary R.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2022, 117 (537) : 184 - 197
  • [4] Incorporating spatial structure into inclusion probabilities for Bayesian variable selection in generalized linear models with the spike-and-slab elastic net
    Leach, Justin M.
    Aban, Inmaculada
    Yi, Nengjun
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2022, 217 : 141 - 152
  • [5] A non-negative spike-and-slab lasso generalized linear stacking prediction modeling method for high-dimensional omics data
    Junjie Shen
    Shuo Wang
    Yongfei Dong
    Hao Sun
    Xichao Wang
    Zaixiang Tang
    [J]. BMC Bioinformatics, 25
  • [6] A non-negative spike-and-slab lasso generalized linear stacking prediction modeling method for high-dimensional omics data
    Shen, Junjie
    Wang, Shuo
    Dong, Yongfei
    Sun, Hao
    Wang, Xichao
    Tang, Zaixiang
    [J]. BMC BIOINFORMATICS, 2024, 25 (01)
  • [7] An Adaptive Bayesian Lasso Approach with Spike-and-Slab Priors to Identify Multiple Linear and Nonlinear Effects in Structural Equation Models
    Brandt, Holger
    Cambria, Jenna
    Kelava, Augustin
    [J]. STRUCTURAL EQUATION MODELING-A MULTIDISCIPLINARY JOURNAL, 2018, 25 (06) : 946 - 960
  • [8] Comparison of linear and semi-parametric models incorporating genomic, pedigree, and associated loci information for the prediction of resistance to stripe rust in an Austrian winter wheat breeding program
    Morales, Laura
    Ametz, Christian
    Dallinger, Hermann Gregor
    Loeschenberger, Franziska
    Neumayer, Anton
    Zimmerl, Simone
    Buerstmayr, Hermann
    [J]. THEORETICAL AND APPLIED GENETICS, 2023, 136 (01) : 1 - 12
  • [9] Comparison of linear and semi-parametric models incorporating genomic, pedigree, and associated loci information for the prediction of resistance to stripe rust in an Austrian winter wheat breeding program
    Laura Morales
    Christian Ametz
    Hermann Gregor Dallinger
    Franziska Löschenberger
    Anton Neumayer
    Simone Zimmerl
    Hermann Buerstmayr
    [J]. Theoretical and Applied Genetics, 2023, 136