Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights

被引:345
|
作者
Pasolli, Edoardo [1 ]
Duy Tin Truong [1 ]
Malik, Faizan [2 ]
Waldron, Levi [2 ]
Segata, Nicola [1 ]
机构
[1] Univ Trento, Ctr Integrat Biol, Trento, Italy
[2] CUNY, Grad Sch Publ Hlth & Hlth Policy, New York, NY 10021 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
MULTICATEGORY CLASSIFICATION METHODS; HUMAN GUT MICROBIOME; COMPREHENSIVE EVALUATION; FECAL MICROBIOTA; GENE-EXPRESSION; VALIDATION; PREDICTION; REGRESSION; SELECTION;
D O I
10.1371/journal.pcbi.1004977
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Shotgun metagenomic analysis of the human associated microbiome provides a rich set of microbial features for prediction and biomarker discovery in the context of human diseases and health conditions. However, the use of such high-resolution microbial features presents new challenges, and validated computational tools for learning tasks are lacking. Moreover, classification rules have scarcely been validated in independent studies, posing questions about the generality and generalization of disease-predictive models across cohorts. In this paper, we comprehensively assess approaches to metagenomics-based prediction tasks and for quantitative assessment of the strength of potential microbiome-phenotype associations. We develop a computational framework for prediction tasks using quantitative microbiome profiles, including species-level relative abundances and presence of strain-specific markers. A comprehensive meta-analysis, with particular emphasis on generalization across cohorts, was performed in a collection of 2424 publicly available metagenomic samples from eight large-scale studies. Cross-validation revealed good disease-prediction capabilities, which were in general improved by feature selection and use of strain-specific markers instead of species-level taxonomic abundance. In cross-study analysis, models transferred between studies were in some cases less accurate than models tested by within-study cross-validation. Interestingly, the addition of healthy (control) samples from other studies to training sets improved disease prediction capabilities. Some microbial species (most notably Streptococcus anginosus) seem to characterize general dysbiotic states of the microbiome rather than connections with a specific disease. Our results in modelling features of the "healthy" microbiome can be considered a first step toward defining general microbial dysbiosis. The software framework, microbiome profiles, and metadata for thousands of samples are publicly available at http://segatalab.cibio.unitn.it/tools/metaml.
引用
收藏
页数:26
相关论文
共 50 条
  • [1] A Meta-Analysis of Overfitting in Machine Learning
    Roelofs, Rebecca
    Fridovich-Keil, Sara
    Miller, John
    Shankar, Vaishaal
    Hardt, Moritz
    Recht, Benjamin
    Schmidt, Ludwig
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [2] Machine Learning for detection of viral sequences in human metagenomic datasets
    Bzhalava, Zurab
    Tampuu, Ardi
    Bala, Piotr
    Vicente, Raul
    Dillner, Joakim
    BMC BIOINFORMATICS, 2018, 19
  • [3] Machine Learning for detection of viral sequences in human metagenomic datasets
    Zurab Bzhalava
    Ardi Tampuu
    Piotr Bała
    Raul Vicente
    Joakim Dillner
    BMC Bioinformatics, 19
  • [4] Machine Learning-Based Automated Grading and Feedback Tools for Programming: A Meta-Analysis
    Messer, Marcus
    Brown, Neil C. C.
    Kolling, Michael
    Shi, Miaojing
    PROCEEDINGS OF THE 2023 CONFERENCE ON INNOVATION AND TECHNOLOGY IN COMPUTER SCIENCE EDUCATION, ITICSE 2023, VOL 1, 2023, : 491 - 497
  • [5] Managing Complex Research Datasets Using Electronic Tools A Meta-Analysis Exemplar
    Brown, Sharon A.
    Martin, Ellen E.
    Garcia, Theresa J.
    Winter, Mary A.
    Garcia, Alexandra A.
    Brown, Adama
    Cuevas, Heather E.
    Sumlin, Lisa L.
    CIN-COMPUTERS INFORMATICS NURSING, 2013, 31 (06) : 257 - 265
  • [6] ARTIFICIAL INTELLIGENCE TO DIAGNOSE ACUTE CORONARY SYNDROMES: INSIGHTS FROM A META-ANALYSIS OF MACHINE LEARNING
    Thao Huynh
    Iannatonne, Patrick
    Zhao, Xun
    Philippe Minh Tri Nguyen
    JOURNAL OF THE AMERICAN COLLEGE OF CARDIOLOGY, 2017, 69 (11) : 20 - 20
  • [7] Remote Sensing and Machine Learning Tools to Support Wetland Monitoring: A Meta-Analysis of Three Decades of Research
    Jafarzadeh, Hamid
    Mahdianpari, Masoud
    Gill, Eric W.
    Brisco, Brian
    Mohammadimanesh, Fariba
    REMOTE SENSING, 2022, 14 (23)
  • [8] MarkerML - Marker Feature Identification in Metagenomic Datasets Using Interpretable Machine Learning
    Nagpal, Sunil
    Singh, Rohan
    Taneja, Bhupesh
    Mande, Sharmila S.
    JOURNAL OF MOLECULAR BIOLOGY, 2022, 434 (11)
  • [9] Systematic Review and Meta-Analysis of Prehospital Machine Learning Scores as Screening Tools for Early Detection of Large Vessel Occlusion in Patients With Suspected Stroke
    Alobaida, Muath
    Joddrell, Martha
    Zheng, Yalin
    Lip, Gregory Y. H.
    Rowe, Fiona J.
    El-Bouri, Wahbi K.
    Hill, Andrew
    Lane, Deirdre A.
    Harrison, Stephanie L.
    JOURNAL OF THE AMERICAN HEART ASSOCIATION, 2024, 13 (12): : e033298
  • [10] Machine learning prediction in cardiovascular diseases: a meta-analysis
    Chayakrit Krittanawong
    Hafeez Ul Hassan Virk
    Sripal Bangalore
    Zhen Wang
    Kipp W. Johnson
    Rachel Pinotti
    HongJu Zhang
    Scott Kaplin
    Bharat Narasimhan
    Takeshi Kitai
    Usman Baber
    Jonathan L. Halperin
    W. H. Wilson Tang
    Scientific Reports, 10