A non-negative spike-and-slab lasso generalized linear stacking prediction modeling method for high-dimensional omics data

被引:0
|
作者
Shen, Junjie [1 ]
Wang, Shuo [2 ,3 ]
Dong, Yongfei [1 ]
Sun, Hao [1 ]
Wang, Xichao [1 ]
Tang, Zaixiang [1 ]
机构
[1] Soochow Univ, Sch Publ Hlth, Jiangsu Key Lab Prevent & Translat Med Geriatr Dis, Dept Biostat,Suzhou Med Coll,MOE Key Lab Geriatr D, 199 Renai Rd, Suzhou 215123, Jiangsu, Peoples R China
[2] Univ Freiburg, Inst Med Biometry & Stat, Fac Med, D-79085 Freiburg, Germany
[3] Univ Freiburg, Med Ctr, D-79085 Freiburg, Germany
基金
中国国家自然科学基金;
关键词
Stacking Bayesian method; Non-negative spike-and-slab prior; Omics segmentation; VARIABLE SELECTION; R PACKAGE; REGRESSION; REGULARIZATION; APPROXIMATION; EXPRESSION; GENES;
D O I
10.1186/s12859-024-05741-6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background High-dimensional omics data are increasingly utilized in clinical and public health research for disease risk prediction. Many previous sparse methods have been proposed that using prior knowledge, e.g., biological group structure information, to guide the model-building process. However, these methods are still based on a single model, offen leading to overconfident inferences and inferior generalization.Results We proposed a novel stacking strategy based on a non-negative spike-and-slab Lasso (nsslasso) generalized linear model (GLM) for disease risk prediction in the context of high-dimensional omics data. Briefly, we used prior biological knowledge to segment omics data into a set of sub-data. Each sub-model was trained separately using the features from the group via a proper base learner. Then, the predictions of sub-models were ensembled by a super learner using nsslasso GLM. The proposed method was compared to several competitors, such as the Lasso, grlasso, and gsslasso, using simulated data and two open-access breast cancer data. As a result, the proposed method showed robustly superior prediction performance to the optimal single-model method in high-noise simulated data and real-world data. Furthermore, compared to the traditional stacking method, the proposed nsslasso stacking method can efficiently handle redundant sub-models and identify important sub-models.Conclusions The proposed nsslasso method demonstrated favorable predictive accuracy, stability, and biological interpretability. Additionally, the proposed method can also be used to detect new biomarkers and key group structures.
引用
收藏
页数:20
相关论文
共 17 条
  • [1] A non-negative spike-and-slab lasso generalized linear stacking prediction modeling method for high-dimensional omics data
    Junjie Shen
    Shuo Wang
    Yongfei Dong
    Hao Sun
    Xichao Wang
    Zaixiang Tang
    [J]. BMC Bioinformatics, 25
  • [2] A novel non-negative Bayesian stacking modeling method for Cancer survival prediction using high-dimensional omics data
    Shen, Junjie
    Wang, Shuo
    Sun, Hao
    Huang, Jie
    Bai, Lu
    Wang, Xichao
    Dong, Yongfei
    Tang, Zaixiang
    [J]. BMC MEDICAL RESEARCH METHODOLOGY, 2024, 24 (01)
  • [3] The Spike-and-Slab Lasso Generalized Linear Models for Prediction and Associated Genes Detection
    Tang, Zaixiang
    Shen, Yueping
    Zhang, Xinyan
    Yi, Nengjun
    [J]. GENETICS, 2017, 205 (01) : 77 - +
  • [4] High-dimensional generalized median adaptive lasso with application to omics data
    Liu, Yahang
    Gao, Qian
    Wei, Kecheng
    Huang, Chen
    Wang, Ce
    Yu, Yongfu
    Qin, Guoyou
    Wang, Tong
    [J]. BRIEFINGS IN BIOINFORMATICS, 2024, 25 (02)
  • [5] Group spike-and-slab lasso generalized linear models for disease prediction and associated genes detection by incorporating pathway information
    Tang, Zaixiang
    Shen, Yueping
    Li, Yan
    Zhang, Xinyan
    Wen, Jia
    Qian, Chen'ao
    Zhuang, Wenzhuo
    Shi, Xinghua
    Yi, Nengjun
    [J]. BIOINFORMATICS, 2018, 34 (06) : 901 - 910
  • [6] Spike-and-slab least absolute shrinkage and selection operator generalized additive models and scalable algorithms for high-dimensional data analysis
    Guo, Boyi
    Jaeger, Byron C.
    Rahman, A. K. M. Fazlur
    Long, D. Leann
    Yi, Nengjun
    [J]. STATISTICS IN MEDICINE, 2022, 41 (20) : 3899 - 3914
  • [7] A penalized linear mixed model with generalized method of moments for prediction analysis on high-dimensional multi-omics data
    Wang, Xiaqiong
    Wen, Yalu
    [J]. BRIEFINGS IN BIOINFORMATICS, 2022, 23 (04)
  • [8] Multi-kernel linear mixed model with adaptive lasso for prediction analysis on high-dimensional multi-omics data
    Li, Jun
    Lu, Qing
    Wen, Yalu
    [J]. BIOINFORMATICS, 2020, 36 (06) : 1785 - 1794
  • [9] Accelerated Non-negative Latent Factor Analysis on High-dimensional and Sparse Matrices via Generalized Momentum Method
    Liu, Zhigang
    Luo, Xin
    Li, Shuai
    Shang, Mingsheng
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 3051 - 3056
  • [10] Non-negative least squares for high-dimensional linear models: Consistency and sparse recovery without regularization
    Slawski, Martin
    Hein, Matthias
    [J]. ELECTRONIC JOURNAL OF STATISTICS, 2013, 7 : 3004 - 3056