Benchmark of Data Processing Methods and Machine Learning Models for Gut Microbiome-Based Diagnosis of Inflammatory Bowel Disease

被引:13
|
作者
Kubinski, Ryszard [1 ]
Djamen-Kepaou, Jean-Yves [1 ]
Zhanabaev, Timur [1 ]
Hernandez-Garcia, Alex [2 ]
Bauer, Stefan [3 ]
Hildebrand, Falk [4 ,5 ]
Korcsmaros, Tamas [4 ,5 ]
Karam, Sani [1 ]
Jantchou, Prevost [6 ]
Kafi, Kamran [1 ]
Martin, Ryan D. [1 ]
机构
[1] Phyla Technol Inc, Montreal, PQ, Canada
[2] Univ Montreal, Quebec Artificial Intelligence Inst, Montreal, PQ, Canada
[3] Max Planck Inst Intelligent Syst, Tubingen, Germany
[4] Quadram Inst Biosci, Gut Microbes & Hlth, Norwich, Norfolk, England
[5] Earlham Inst, Norwich, Norfolk, England
[6] Ctr Hosp Univ St Justine, Montreal, PQ, Canada
基金
英国生物技术与生命科学研究理事会; 欧盟地平线“2020”; 欧洲研究理事会;
关键词
inflammatory bowel disease; machine learning; gut microbiome; batch effect reduction; data normalization; QIIME2; PICRUSt2; COMPOSITIONAL DATA; ULCERATIVE-COLITIS; RISK-FACTORS; DIVERSITY; DELAY; EXPRESSION; PREDICTION; THERAPY; IMPACT; SILVA;
D O I
10.3389/fgene.2022.784397
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Patients with inflammatory bowel disease (IBD) wait months and undergo numerous invasive procedures between the initial appearance of symptoms and receiving a diagnosis. In order to reduce time until diagnosis and improve patient wellbeing, machine learning algorithms capable of diagnosing IBD from the gut microbiome's composition are currently being explored. To date, these models have had limited clinical application due to decreased performance when applied to a new cohort of patient samples. Various methods have been developed to analyze microbiome data which may improve the generalizability of machine learning IBD diagnostic tests. With an abundance of methods, there is a need to benchmark the performance and generalizability of various machine learning pipelines (from data processing to training a machine learning model) for microbiome-based IBD diagnostic tools. We collected fifteen 16S rRNA microbiome datasets (7,707 samples) from North America to benchmark combinations of gut microbiome features, data normalization and transformation methods, batch effect correction methods, and machine learning models. Pipeline generalizability to new cohorts of patients was evaluated with two binary classification metrics following leave-one-dataset-out cross (LODO) validation, where all samples from one study were left out of the training set and tested upon. We demonstrate that taxonomic features processed with a compositional transformation method and batch effect correction with the naive zero-centering method attain the best classification performance. In addition, machine learning models that identify non-linear decision boundaries between labels are more generalizable than those that are linearly constrained. Lastly, we illustrate the importance of generating a curated training dataset to ensure similar performance across patient demographics. These findings will help improve the generalizability of machine learning models as we move towards non-invasive diagnostic and disease management tools for patients with IBD.
引用
下载
收藏
页数:22
相关论文
共 50 条
  • [1] Gut microbiome-based supervised machine learning for clinical diagnosis of inflammatory bowel diseases
    Manandhar, Ishan
    Alimadadi, Ahmad
    Aryal, Sachin
    Munroe, Patricia B.
    Joe, Bina
    Cheng, Xi
    AMERICAN JOURNAL OF PHYSIOLOGY-GASTROINTESTINAL AND LIVER PHYSIOLOGY, 2021, 320 (03): : G328 - G337
  • [2] Gut microbiome-based therapeutics in inflammatory bowel disease
    Hu, Kelly A.
    Gubatan, John
    CLINICAL AND TRANSLATIONAL DISCOVERY, 2023, 3 (02):
  • [3] Noninvasive, microbiome-based diagnosis of inflammatory bowel disease
    Zheng, Jiaying
    Sun, Qianru
    Zhang, Mengjing
    Liu, Chengyu
    Su, Qi
    Zhang, Lin
    Xu, Zhilu
    Lu, Wenqi
    Ching, Jessica
    Tang, Whitney
    Cheung, Chun Pan
    Hamilton, Amy L.
    O'Brien, Amy L. Wilson
    Wei, Shu Chen
    Bernstein, Charles N.
    Rubin, David T.
    Chang, Eugene B.
    Morrison, Mark
    Kamm, Michael A.
    Chan, Francis K. L.
    Zhang, Jingwan
    Ng, Siew C.
    NATURE MEDICINE, 2024, : 3555 - 3567
  • [4] INFLAMMATORY BOWEL DISEASE CLASSIFICATION USING THE GUT MICROBIOME: A BENCHMARK OF MICROBIAL DATA ANALYSIS METHODS
    Kubinski, Ryszard
    Djamen, Jean
    Zhanabaev, Timur
    Martin, Ryan
    INFLAMMATORY BOWEL DISEASES, 2021, 27 : S40 - S40
  • [5] INFLAMMATORY BOWEL DISEASE CLASSIFICATION USING THE GUT MICROBIOME: A BENCHMARK OF MICROBIAL DATA ANALYSIS METHODS
    Kubinski, Ryszard
    Djamen, Jean
    Zhanabaev, Timur
    Martin, Ryan
    GASTROENTEROLOGY, 2021, 160 (03) : S55 - S55
  • [6] Machine Learning Strategy for Gut Microbiome-Based Diagnostic Screening of Cardiovascular Disease
    Aryal, Sachin
    Alimadadi, Ahmad
    Manandhar, Ishan
    Joe, Bina
    Cheng, Xi
    HYPERTENSION, 2020, 76 (05) : 1555 - 1562
  • [7] Faecal microbiome-based machine learning for multi-class disease diagnosis
    Qi Su
    Qin Liu
    Raphaela Iris Lau
    Jingwan Zhang
    Zhilu Xu
    Yun Kit Yeoh
    Thomas W. H. Leung
    Whitney Tang
    Lin Zhang
    Jessie Q. Y. Liang
    Yuk Kam Yau
    Jiaying Zheng
    Chengyu Liu
    Mengjing Zhang
    Chun Pan Cheung
    Jessica Y. L. Ching
    Hein M. Tun
    Jun Yu
    Francis K. L. Chan
    Siew C. Ng
    Nature Communications, 13
  • [8] Faecal microbiome-based machine learning for multi-class disease diagnosis
    Su, Qi
    Liu, Qin
    Lau, Raphaela Iris
    Zhang, Jingwan
    Xu, Zhilu
    Yeoh, Yun Kit
    Leung, Thomas W. H.
    Tang, Whitney
    Zhang, Lin
    Liang, Jessie Q. Y.
    Yau, Yuk Kam
    Zheng, Jiaying
    Liu, Chengyu
    Zhang, Mengjing
    Cheung, Chun Pan
    Ching, Jessica Y. L.
    Tun, Hein M.
    Yu, Jun
    Chan, Francis K. L.
    Ng, Siew C.
    NATURE COMMUNICATIONS, 2022, 13 (01)
  • [9] Current and future microbiome-based therapies in inflammatory bowel disease
    Montrose, Jonathan A.
    Kurada, Satya
    Fischer, Monika
    CURRENT OPINION IN GASTROENTEROLOGY, 2024, 40 (04) : 258 - 267
  • [10] Machine learning model for microbiome-based diagnosis of bacterial vaginosis
    Gupta, Somesh
    Challa, Apoorva
    Kachhawa, Garima
    Nagpal, Sunil
    Sood, Seema
    Taneja, Bhupesh
    SEXUALLY TRANSMITTED DISEASES, 2024, 51 (01) : S425 - S425