Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights

被引:345
|
作者
Pasolli, Edoardo [1 ]
Duy Tin Truong [1 ]
Malik, Faizan [2 ]
Waldron, Levi [2 ]
Segata, Nicola [1 ]
机构
[1] Univ Trento, Ctr Integrat Biol, Trento, Italy
[2] CUNY, Grad Sch Publ Hlth & Hlth Policy, New York, NY 10021 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
MULTICATEGORY CLASSIFICATION METHODS; HUMAN GUT MICROBIOME; COMPREHENSIVE EVALUATION; FECAL MICROBIOTA; GENE-EXPRESSION; VALIDATION; PREDICTION; REGRESSION; SELECTION;
D O I
10.1371/journal.pcbi.1004977
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Shotgun metagenomic analysis of the human associated microbiome provides a rich set of microbial features for prediction and biomarker discovery in the context of human diseases and health conditions. However, the use of such high-resolution microbial features presents new challenges, and validated computational tools for learning tasks are lacking. Moreover, classification rules have scarcely been validated in independent studies, posing questions about the generality and generalization of disease-predictive models across cohorts. In this paper, we comprehensively assess approaches to metagenomics-based prediction tasks and for quantitative assessment of the strength of potential microbiome-phenotype associations. We develop a computational framework for prediction tasks using quantitative microbiome profiles, including species-level relative abundances and presence of strain-specific markers. A comprehensive meta-analysis, with particular emphasis on generalization across cohorts, was performed in a collection of 2424 publicly available metagenomic samples from eight large-scale studies. Cross-validation revealed good disease-prediction capabilities, which were in general improved by feature selection and use of strain-specific markers instead of species-level taxonomic abundance. In cross-study analysis, models transferred between studies were in some cases less accurate than models tested by within-study cross-validation. Interestingly, the addition of healthy (control) samples from other studies to training sets improved disease prediction capabilities. Some microbial species (most notably Streptococcus anginosus) seem to characterize general dysbiotic states of the microbiome rather than connections with a specific disease. Our results in modelling features of the "healthy" microbiome can be considered a first step toward defining general microbial dysbiosis. The software framework, microbiome profiles, and metadata for thousands of samples are publicly available at http://segatalab.cibio.unitn.it/tools/metaml.
引用
收藏
页数:26
相关论文
共 50 条
  • [41] Machine learning algorithms for predicting PTSD: a systematic review and meta-analysis
    Masoumeh Vali
    Hossein Motahari Nezhad
    Levente Kovacs
    Amir H Gandomi
    BMC Medical Informatics and Decision Making, 25 (1)
  • [42] Biomass higher heating value prediction machine learning insights into ultimate, proximate, and structural analysis datasets
    Brandic, Ivan
    Voca, Neven
    Gunjaca, Jerko
    Loncar, Biljana
    Bilandzija, Nikola
    Peter, Anamarija
    Suric, Jona
    Pezo, Lato
    ENERGY SOURCES PART A-RECOVERY UTILIZATION AND ENVIRONMENTAL EFFECTS, 2024, 46 (01) : 2842 - 2854
  • [43] Meta-analysis of voice disorders databases and applied machine learning techniques
    Syed, Sidra Abid
    Rashid, Munaf
    Hussain, Samreen
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2020, 17 (06) : 7958 - 7979
  • [44] Machine learning in predicting antimicrobial resistance: a systematic review and meta-analysis
    Tang, Rui
    Luo, Rui
    Tang, Shiwei
    Song, Haoxin
    Chen, Xiujuan
    INTERNATIONAL JOURNAL OF ANTIMICROBIAL AGENTS, 2022, 60 (5-6)
  • [45] Machine learning approaches for prediction of bipolar disorder based on biological, clinical and neuropsychological markers: A systematic review and meta-analysis
    Colombo, Federica
    Calesella, Federico
    Mazza, Mario Gennaro
    Melloni, Elisa Maria Teresa
    Morelli, Marco J.
    Scotti, Giulia Maria
    Benedetti, Francesco
    Bollettini, Irene
    Vai, Benedetta
    NEUROSCIENCE AND BIOBEHAVIORAL REVIEWS, 2022, 135
  • [46] Machine learning approaches for prediction of bipolar disorder based on biological, clinical and neuropsychological markers: a systematic review and meta-analysis
    Colombo, F.
    Calesella, F.
    Mazza, M. G.
    Melloni, E. M. T.
    Benedetti, F.
    Vai, B.
    EUROPEAN NEUROPSYCHOPHARMACOLOGY, 2021, 53 : S61 - S62
  • [47] Functional Annotation from Meta-Analysis of Microarray Datasets
    Srivastava, Gyan P.
    Qiu, Jing
    Xu, Dong
    2008 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, PROCEEDINGS, 2008, : 367 - +
  • [48] Regionally Smoothed Meta-Analysis Methods for GWAS Datasets
    Begum, Ferdouse
    Sharker, Monir H.
    Sherman, Stephanie L.
    Tseng, George C.
    Feingold, Eleanor
    GENETIC EPIDEMIOLOGY, 2016, 40 (02) : 154 - 160
  • [49] Identification of key genes and biological regulatory mechanisms in diabetic nephropathy: Meta-analysis of gene expression datasets
    Hojjati, Fatemeh
    Roointan, Amir
    Gholaminejad, Alieh
    Eshraghi, Yasin
    Gheisari, Yousof
    NEFROLOGIA, 2023, 43 (05): : 575 - 586
  • [50] Machine Learning Models for Early Prediction of Sepsis on Large Healthcare Datasets
    Camacho-Cogollo, Javier Enrique
    Bonet, Isis
    Gil, Bladimir
    Iadanza, Ernesto
    ELECTRONICS, 2022, 11 (09)