Transcriptome prediction performance across machine learning models and diverse ancestries

被引:17
|
作者
Okoro, Paul C. [1 ]
Schubert, Ryan [2 ]
Guo, Xiuqing [3 ,4 ]
Johnson, W. Craig [5 ]
Rotter, Jerome, I [3 ,4 ]
Hoeschele, Ina [6 ,7 ,8 ]
Liu, Yongmei [9 ]
Im, Hae Kyung [10 ]
Luke, Amy [11 ]
Dugas, Lara R. [11 ,12 ]
Wheeler, Heather E. [1 ,13 ,14 ]
机构
[1] Loyola Univ Chicago, Program Bioinformat, Chicago, IL 60660 USA
[2] Loyola Univ Chicago, Dept Math & Stat, Chicago, IL USA
[3] Harbor UCLA Med Ctr, Inst Translat Genom & Populat Sci, Lundquist Inst, Torrance, CA 90509 USA
[4] Harbor UCLA Med Ctr, Dept Pediat, Torrance, CA 90509 USA
[5] Univ Washington, Dept Biostat, Seattle, WA 98195 USA
[6] Virginia Tech, Fralin Life Sci Inst, Blacksburg, VA USA
[7] Virginia Tech, Dept Stat, Blacksburg, VA USA
[8] Wake Forest Sch Med, Winston Salem, NC 27101 USA
[9] Duke Univ, Sch Med, Dept Med, Durham, NC 27706 USA
[10] Univ Chicago, Dept Med, Sect Genet Med, 5841 S Maryland Ave, Chicago, IL 60637 USA
[11] Loyola Univ Chicago, Parkinson Sch Hlth Sci & Publ Hlth, Dept Publ Hlth Sci, Maywood, IL USA
[12] Univ Cape Town, Fac Hlth Sci, Dept Human Biol, Cape Town, South Africa
[13] Loyola Univ Chicago, Dept Biol, Chicago, IL 60660 USA
[14] Loyola Univ Chicago, Dept Comp Sci, Chicago, IL 60660 USA
来源
关键词
GENOME-WIDE ASSOCIATION; GENE-EXPRESSION; VARIABLE SELECTION; COMPLEX TRAITS; REGRESSION; CETP; STRATIFICATION; REGULARIZATION; INFERENCE; HDL;
D O I
10.1016/j.xhgg.2020.100019
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Transcriptome prediction methods such as PrediXcan and FUSION have become popular in complex trait mapping. Most transcriptome prediction models have been trained in European populations using methods that make parametric linear assumptions like the elastic net (EN). To potentially further optimize imputation performance of gene expression across global populations, we built transcriptome prediction models using both linear and non-linear machine learning (ML) algorithms and evaluated their performance in comparison to EN. We trained models using genotype and blood monocyte transcriptome data from the Multi-Ethnic Study of Atherosclerosis (MESA) comprising individuals of African, Hispanic, and European ancestries and tested them using genotype and whole-blood transcriptome data from the Modeling the Epidemiology Transition Study (METS) comprising individuals of African ancestries. We show that the prediction performance is highest when the training and the testing population share similar ancestries regardless of the prediction algorithm used. While EN generally outperformed random forest (RF), support vector regression (SVR), and K nearest neighbor (KNN), we found that RF outperformed EN for some genes, particularly between disparate ancestries, suggesting potential robustness and reduced variability of RF imputation performance across global populations. When applied to a high-density lipoprotein (HDL) phenotype, we show including RF prediction models in PrediXcan revealed potential gene associations missed by EN models. Therefore, by integrating other ML modeling into PrediXcan and diversifying our training populations to include more global ancestries, we may uncover new genes associated with complex traits.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Improved genomic prediction performance with ensembles of diverse models
    Tomura, Shunichiro
    Wilkinson, Melanie J.
    Cooper, Mark
    Powell, Owen
    G3-GENES GENOMES GENETICS, 2025,
  • [42] Quantifying factors that affect polygenic risk score performance across diverse ancestries and age groups for body mass index
    Hui, Daniel
    Xiao, Brenda
    Dikilitas, Ozan
    Freimuth, Robert R.
    Irvin, Marguerite R.
    Jarvik, Gail P.
    Kottyan, Leah
    Kullo, Iftikhar
    Limdi, Nita A.
    Liu, Cong
    Luo, Yuan
    Namjou, Bahram
    Puckelwartz, Megan J.
    Schaid, Daniel
    Tiwari, Hemant
    Wei, Wei-Qi
    Verma, Shefali
    Kim, Dokyoon
    Ritchie, Marylyn D.
    BIOCOMPUTING 2023, PSB 2023, 2023, : 437 - 448
  • [43] Prediction of Wind Power with Machine Learning Models
    Karaman, Omer Ali
    APPLIED SCIENCES-BASEL, 2023, 13 (20):
  • [44] Machine Learning Models for Inpatient Glucose Prediction
    Zale, Andrew
    Mathioudakis, Nestoras
    CURRENT DIABETES REPORTS, 2022, 22 (08) : 353 - 364
  • [45] Machine Learning Models for Inpatient Glucose Prediction
    Andrew Zale
    Nestoras Mathioudakis
    Current Diabetes Reports, 2022, 22 : 353 - 364
  • [46] Machine Learning Models for Stock Price Prediction
    Nassif, Ali Bou
    AlaaEddin, Maha
    Sahib, Amna Akram
    2020 SEVENTH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY TRENDS (ITT 2020), 2020, : 67 - 71
  • [47] Interpretable machine learning models for crime prediction
    Zhang, Xu
    Liu, Lin
    Lan, Minxuan
    Song, Guangwen
    Xiao, Luzi
    Chen, Jianguo
    COMPUTERS ENVIRONMENT AND URBAN SYSTEMS, 2022, 94
  • [48] On the Efficiency of Machine Learning Models in Malaria Prediction
    Mbaye, Ousseynou
    Mouhamadou, Lamine B. A.
    Alassane, S. Y.
    PUBLIC HEALTH AND INFORMATICS, PROCEEDINGS OF MIE 2021, 2021, 281 : 437 - 441
  • [49] Explainability of Machine Learning Models for Bankruptcy Prediction
    Park, Min Sue
    Son, Hwijae
    Hyun, Chongseok
    Hwang, Hyung Ju
    IEEE ACCESS, 2021, 9 : 124887 - 124899
  • [50] Machine learning models for the prediction of xeniobiotic metabolism
    Kops, Christina de Bruyn
    Sicho, Martin
    Plonka, Wojtek
    Mazzolari, Angelica
    Kochev, Nikolay
    Jeliazkova, Nina
    Pedretti, Alessandro
    Svozil, Daniel
    Testa, Bernard
    Vistoli, Giulio
    Kirchmair, Johannes
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2018, 256