Transcriptome prediction performance across machine learning models and diverse ancestries

被引:17
|
作者
Okoro, Paul C. [1 ]
Schubert, Ryan [2 ]
Guo, Xiuqing [3 ,4 ]
Johnson, W. Craig [5 ]
Rotter, Jerome, I [3 ,4 ]
Hoeschele, Ina [6 ,7 ,8 ]
Liu, Yongmei [9 ]
Im, Hae Kyung [10 ]
Luke, Amy [11 ]
Dugas, Lara R. [11 ,12 ]
Wheeler, Heather E. [1 ,13 ,14 ]
机构
[1] Loyola Univ Chicago, Program Bioinformat, Chicago, IL 60660 USA
[2] Loyola Univ Chicago, Dept Math & Stat, Chicago, IL USA
[3] Harbor UCLA Med Ctr, Inst Translat Genom & Populat Sci, Lundquist Inst, Torrance, CA 90509 USA
[4] Harbor UCLA Med Ctr, Dept Pediat, Torrance, CA 90509 USA
[5] Univ Washington, Dept Biostat, Seattle, WA 98195 USA
[6] Virginia Tech, Fralin Life Sci Inst, Blacksburg, VA USA
[7] Virginia Tech, Dept Stat, Blacksburg, VA USA
[8] Wake Forest Sch Med, Winston Salem, NC 27101 USA
[9] Duke Univ, Sch Med, Dept Med, Durham, NC 27706 USA
[10] Univ Chicago, Dept Med, Sect Genet Med, 5841 S Maryland Ave, Chicago, IL 60637 USA
[11] Loyola Univ Chicago, Parkinson Sch Hlth Sci & Publ Hlth, Dept Publ Hlth Sci, Maywood, IL USA
[12] Univ Cape Town, Fac Hlth Sci, Dept Human Biol, Cape Town, South Africa
[13] Loyola Univ Chicago, Dept Biol, Chicago, IL 60660 USA
[14] Loyola Univ Chicago, Dept Comp Sci, Chicago, IL 60660 USA
来源
关键词
GENOME-WIDE ASSOCIATION; GENE-EXPRESSION; VARIABLE SELECTION; COMPLEX TRAITS; REGRESSION; CETP; STRATIFICATION; REGULARIZATION; INFERENCE; HDL;
D O I
10.1016/j.xhgg.2020.100019
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Transcriptome prediction methods such as PrediXcan and FUSION have become popular in complex trait mapping. Most transcriptome prediction models have been trained in European populations using methods that make parametric linear assumptions like the elastic net (EN). To potentially further optimize imputation performance of gene expression across global populations, we built transcriptome prediction models using both linear and non-linear machine learning (ML) algorithms and evaluated their performance in comparison to EN. We trained models using genotype and blood monocyte transcriptome data from the Multi-Ethnic Study of Atherosclerosis (MESA) comprising individuals of African, Hispanic, and European ancestries and tested them using genotype and whole-blood transcriptome data from the Modeling the Epidemiology Transition Study (METS) comprising individuals of African ancestries. We show that the prediction performance is highest when the training and the testing population share similar ancestries regardless of the prediction algorithm used. While EN generally outperformed random forest (RF), support vector regression (SVR), and K nearest neighbor (KNN), we found that RF outperformed EN for some genes, particularly between disparate ancestries, suggesting potential robustness and reduced variability of RF imputation performance across global populations. When applied to a high-density lipoprotein (HDL) phenotype, we show including RF prediction models in PrediXcan revealed potential gene associations missed by EN models. Therefore, by integrating other ML modeling into PrediXcan and diversifying our training populations to include more global ancestries, we may uncover new genes associated with complex traits.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Optimizing clinico-genomic disease prediction across ancestries: a machine learning strategy with Pareto improvement
    Gao, Yan
    Cui, Yan
    GENOME MEDICINE, 2024, 16 (01):
  • [2] Machine Learning Models for Multirotor Performance Prediction
    Cornelius, Jason
    Schmitz, Sven
    JOURNAL OF AIRCRAFT, 2024, 61 (04): : 1303 - 1313
  • [3] Assessing the performance of machine learning models for predicting soil organic carbon variability across diverse landforms
    Dadgar, Maryam
    Faramarzi, Seyedeh Ensieh
    ENVIRONMENTAL EARTH SCIENCES, 2024, 83 (23)
  • [4] Comparison of machine learning and deep learning techniques in promoter prediction across diverse species
    Bhandari, Nikita
    Khare, Satyajeet
    Walambe, Rahee
    Kotecha, Ketan
    PEERJ COMPUTER SCIENCE, 2021,
  • [5] Comparison of machine learning and deep learning techniques in promoter prediction across diverse species
    Bhandari N.
    Khare S.
    Walambe R.
    Kotecha K.
    PeerJ Computer Science, 2021, 7 : 1 - 17
  • [6] Machine learning prediction of prime editing efficiency across diverse chromatin contexts
    Mathis, Nicolas
    Allam, Ahmed
    Talas, Andras
    Kissling, Lucas
    Benvenuto, Elena
    Schmidheini, Lukas
    Schep, Ruben
    Damodharan, Tanav
    Balazs, Zsolt
    Janjuha, Sharan
    Ioannidi, Eleonora I.
    Bock, Desiree
    van Steensel, Bas
    Krauthammer, Michael
    Schwank, Gerald
    NATURE BIOTECHNOLOGY, 2024,
  • [7] Performance Comparison of Machine Learning Models for Diabetes Prediction
    Cihan, Pinar
    Coskun, Hakan
    29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [8] Supervised machine learning models for student performance prediction
    Alachiotis, Nikolaos S.
    Kotsiantis, Sotiris
    Sakkopoulos, Evangelos
    Verykios, Vassilios S.
    INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2022, 16 (01): : 93 - 106
  • [9] Analysis of Machine Learning Models for Academic Performance Prediction
    Benitez Amaya, Andres
    Castro Barrera, Harold
    Manrique, Ruben
    GENERATIVE INTELLIGENCE AND INTELLIGENT TUTORING SYSTEMS, PT II, ITS 2024, 2024, 14799 : 150 - 161
  • [10] Genetic risk prediction for multiple sclerosis across diverse ancestries in a hospital-based cohort
    Hui, D.
    Fang, Z.
    Ploumakis, A.
    Patsopoulos, N.
    MULTIPLE SCLEROSIS JOURNAL, 2019, 25 : 654 - 655