Gene-based microbiome representation enhances host phenotype classification

被引:2
|
作者
Deschenes, Thomas [1 ,2 ,3 ]
Tohoundjona, Fred Wilfried Elom [1 ,2 ]
Plante, Pier-Luc [1 ,2 ,3 ]
Di Marzo, Vincenzo [1 ,2 ,4 ,5 ,6 ,7 ]
Raymond, Frederic [1 ,2 ,3 ,4 ]
机构
[1] Univ Laval, Inst Nutr & Aliments Fonct INAF, Ctr Nutr Sante & Soc NUTRISS, Quebec City, PQ, Canada
[2] Canada Res Excellence Chair Microbiome Endocannabi, Quebec City, PQ, Canada
[3] Univ Laval, Inst Intelligence & Donnees, Quebec City, PQ, Canada
[4] Univ Laval, Ecole Nutr, Fac Sci Agr & alimentat FSAA, Quebec City, PQ, Canada
[5] Ctr Rech Inst Univ cardiol & pneumol Quebec IUCPQ, Quebec City, PQ, Canada
[6] Univ Laval, Fac Med, Dept Med, Quebec City, PQ, Canada
[7] Joint Int Unit Chem & Biomol Res Microbiome & its, Quebec City, PQ, Canada
基金
加拿大健康研究院; 加拿大自然科学与工程研究理事会;
关键词
microbiome; machine learning; metagenomics; shotgun microbiome; feature selection; gene clusters; interpretable models; metabolic health; gut-brain axis; endocannabinoidome; HUMAN GUT MICROBIOME; METAGENOMICS;
D O I
10.1128/msystems.00531-23
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
With the concomitant advances in both the microbiome and machine learning fields, the gut microbiome has become of great interest for the potential discovery of biomarkers to be used in the classification of the host health status. Shotgun metagenomics data derived from the human microbiome is composed of a high-dimensional set of microbial features. The use of such complex data for the modeling of host-microbiome interactions remains a challenge as retaining de novo content yields a highly granular set of microbial features. In this study, we compared the prediction performances of machine learning approaches according to different types of data representations derived from shotgun metagenomics. These representations include commonly used taxonomic and functional profiles and the more granular gene cluster approach. For the five case-control datasets used in this study (Type 2 diabetes, obesity, liver cirrhosis, colorectal cancer, and inflammatory bowel disease), gene-based approaches, whether used alone or in combination with reference-based data types, allowed improved or similar classification performances as the taxonomic and functional profiles. In addition, we show that using subsets of gene families from specific functional categories of genes highlight the importance of these functions on the host phenotype. This study demonstrates that both reference-free microbiome representations and curated metagenomic annotations can provide relevant representations for machine learning based on metagenomic data. IMPORTANCEData representation is an essential part of machine learning performance when using metagenomic data. In this work, we show that different microbiome representations provide varied host phenotype classification performance depending on the dataset. In classification tasks, untargeted microbiome gene content can provide similar or improved classification compared to taxonomical profiling. Feature selection based on biological function also improves classification performance for some pathologies. Function-based feature selection combined with interpretable machine learning algorithms can generate new hypotheses that can potentially be assayed mechanistically. This work thus proposes new approaches to represent microbiome data for machine learning that can potentiate the findings associated with metagenomic data. Data representation is an essential part of machine learning performance when using metagenomic data. In this work, we show that different microbiome representations provide varied host phenotype classification performance depending on the dataset. In classification tasks, untargeted microbiome gene content can provide similar or improved classification compared to taxonomical profiling. Feature selection based on biological function also improves classification performance for some pathologies. Function-based feature selection combined with interpretable machine learning algorithms can generate new hypotheses that can potentially be assayed mechanistically. This work thus proposes new approaches to represent microbiome data for machine learning that can potentiate the findings associated with metagenomic data.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] TADA: phylogenetic augmentation of microbiome samples enhances phenotype classification
    Sayyari, Erfan
    Kawas, Ban
    Mirarab, Siavash
    BIOINFORMATICS, 2019, 35 (14) : I31 - I40
  • [2] Gene-based approach to human gene-phenotype correlations
    Dryja, TP
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1997, 94 (22) : 12117 - 12121
  • [3] A Gene-Based Classification of Primary Adrenocortical Hyperplasias
    Hannah-Shmouni, Fady
    Stratakis, Constantine A.
    HORMONE AND METABOLIC RESEARCH, 2020, 52 (03) : 133 - 141
  • [4] The relationship between the host genome, microbiome, and host phenotype
    Brussow, Harald
    ENVIRONMENTAL MICROBIOLOGY, 2020, 22 (04) : 1170 - 1173
  • [5] Phenotype forecasting with SNPs data through gene-based Bayesian networks
    Alberto Malovini
    Angelo Nuzzo
    Fulvia Ferrazzi
    Annibale A Puca
    Riccardo Bellazzi
    BMC Bioinformatics, 10
  • [6] Phenotype forecasting with SNPs data through gene-based Bayesian networks
    Malovini, Alberto
    Nuzzo, Angelo
    Ferrazzi, Fulvia
    Puca, Annibale A.
    Bellazzi, Riccardo
    BMC BIOINFORMATICS, 2009, 10
  • [7] Gene-based FVIIa prophylaxis modulates the spontaneous bleeding phenotype of hemophilia A rats
    Zintner, Shannon M.
    Small, Juliana C.
    Pavani, Giulia
    Dankner, Lynn
    Marcos-Contreras, Oscar A.
    Gimotty, Phyllis A.
    Kjelgaard-Hansen, Mads
    Wiinberg, Bo
    Margaritis, Paris
    BLOOD ADVANCES, 2019, 3 (03) : 301 - 311
  • [8] Host phenotype classification from human microbiome data is mainly driven by the presence of microbial taxa
    Giliberti, Renato
    Cavaliere, Sara
    Mauriello, Italia Elisa
    Ercolini, Danilo
    Pasolli, Edoardo
    PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (04)
  • [9] Gene-based vaccines
    Liu, MA
    Ulmer, JB
    MOLECULAR THERAPY, 2000, 1 (06) : 497 - 500
  • [10] Actionable gene-based classification toward precision medicine in gastric cancer
    Ichikawa, Hiroshi
    Nagahashi, Masayuki
    Shimada, Yoshifumi
    Hanyu, Takaaki
    Ishikawa, Takashi
    Kameyama, Hitoshi
    Kobayashi, Takashi
    Sakata, Jun
    Yabusaki, Hiroshi
    Nakagawa, Satoru
    Sato, Nobuaki
    Hirata, Yuki
    Kitagawa, Yuko
    Tanahashi, Toshiyuki
    Yoshida, Kazuhiro
    Nakanishi, Ryota
    Oki, Eiji
    Vuzman, Dana
    Lyle, Stephen
    Takabe, Kazuaki
    Ling, Yiwei
    Okuda, Shujiro
    Akazawa, Kohei
    Wakai, Toshifumi
    GENOME MEDICINE, 2017, 9