Benchmark of Data Processing Methods and Machine Learning Models for Gut Microbiome-Based Diagnosis of Inflammatory Bowel Disease

被引:13
|
作者
Kubinski, Ryszard [1 ]
Djamen-Kepaou, Jean-Yves [1 ]
Zhanabaev, Timur [1 ]
Hernandez-Garcia, Alex [2 ]
Bauer, Stefan [3 ]
Hildebrand, Falk [4 ,5 ]
Korcsmaros, Tamas [4 ,5 ]
Karam, Sani [1 ]
Jantchou, Prevost [6 ]
Kafi, Kamran [1 ]
Martin, Ryan D. [1 ]
机构
[1] Phyla Technol Inc, Montreal, PQ, Canada
[2] Univ Montreal, Quebec Artificial Intelligence Inst, Montreal, PQ, Canada
[3] Max Planck Inst Intelligent Syst, Tubingen, Germany
[4] Quadram Inst Biosci, Gut Microbes & Hlth, Norwich, Norfolk, England
[5] Earlham Inst, Norwich, Norfolk, England
[6] Ctr Hosp Univ St Justine, Montreal, PQ, Canada
基金
英国生物技术与生命科学研究理事会; 欧盟地平线“2020”; 欧洲研究理事会;
关键词
inflammatory bowel disease; machine learning; gut microbiome; batch effect reduction; data normalization; QIIME2; PICRUSt2; COMPOSITIONAL DATA; ULCERATIVE-COLITIS; RISK-FACTORS; DIVERSITY; DELAY; EXPRESSION; PREDICTION; THERAPY; IMPACT; SILVA;
D O I
10.3389/fgene.2022.784397
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Patients with inflammatory bowel disease (IBD) wait months and undergo numerous invasive procedures between the initial appearance of symptoms and receiving a diagnosis. In order to reduce time until diagnosis and improve patient wellbeing, machine learning algorithms capable of diagnosing IBD from the gut microbiome's composition are currently being explored. To date, these models have had limited clinical application due to decreased performance when applied to a new cohort of patient samples. Various methods have been developed to analyze microbiome data which may improve the generalizability of machine learning IBD diagnostic tests. With an abundance of methods, there is a need to benchmark the performance and generalizability of various machine learning pipelines (from data processing to training a machine learning model) for microbiome-based IBD diagnostic tools. We collected fifteen 16S rRNA microbiome datasets (7,707 samples) from North America to benchmark combinations of gut microbiome features, data normalization and transformation methods, batch effect correction methods, and machine learning models. Pipeline generalizability to new cohorts of patients was evaluated with two binary classification metrics following leave-one-dataset-out cross (LODO) validation, where all samples from one study were left out of the training set and tested upon. We demonstrate that taxonomic features processed with a compositional transformation method and batch effect correction with the naive zero-centering method attain the best classification performance. In addition, machine learning models that identify non-linear decision boundaries between labels are more generalizable than those that are linearly constrained. Lastly, we illustrate the importance of generating a curated training dataset to ensure similar performance across patient demographics. These findings will help improve the generalizability of machine learning models as we move towards non-invasive diagnostic and disease management tools for patients with IBD.
引用
下载
收藏
页数:22
相关论文
共 50 条
  • [41] Machine learning models for pancreatic cancer diagnosis based on microbiome markers from serum extracellular vesicles
    Doeun Lee
    Chanhee Lee
    Kyulhee Han
    Taewan Goo
    Boram Kim
    Youngmin Han
    Wooil Kwon
    Seungyeoun Lee
    Jin-Young Jang
    Taesung Park
    Scientific Reports, 15 (1)
  • [42] Machine learning-based meta-analysis of colorectal cancer and inflammatory bowel disease
    Sardari, Aria
    Usefi, Hamid
    PLOS ONE, 2023, 18 (12):
  • [43] A machine learning algorithm based on a combination of an individual's meal features, microbiome, and activity can predict postprandial glycemic response in inflammatory bowel disease
    Wark, G.
    Shi, Y.
    Read, M.
    Burke, C.
    Samocha-Bonet, D.
    Macia, L.
    Ghaly, S.
    Danta, M.
    JOURNAL OF GASTROENTEROLOGY AND HEPATOLOGY, 2020, 35 : 134 - 135
  • [44] Hippocampal atrophy based Alzheimer's disease diagnosis via machine learning methods
    Uysal, Gokce
    Ozturk, Mahmut
    JOURNAL OF NEUROSCIENCE METHODS, 2020, 337
  • [45] Performance Evaluation of Machine Learning Models for Multiple Chronic Disease Diagnosis Using Symptom Data
    Singh, Kulvinder
    Dhawan, Sanjeev
    Mehla, Deepanshu
    AUTOMATIC CONTROL AND COMPUTER SCIENCES, 2024, 58 (02) : 195 - 208
  • [46] Unexpected Actors in Inflammatory Bowel Disease Revealed by Machine Learning from Whole-Blood Transcriptomic Data
    Nowak, Jan K.
    Szymanska, Cyntia J.
    Glapa-Nowak, Aleksandra
    Duclaux-Loras, Remi
    Dybska, Emilia
    Ostrowski, Jerzy
    Walkowiak, Jaroslaw
    Adams, Alex T.
    GENES, 2022, 13 (09)
  • [47] Supervised Machine Learning Classifies Inflammatory Bowel Disease Patients by Subtype Using Whole Exome Sequencing Data
    Stafford, Imogen S.
    Ashton, James J.
    Mossotto, Enrico
    Cheng, Guo
    Beattie, R. Mark
    Ennis, Sarah
    JOURNAL OF CROHNS & COLITIS, 2023, 17 (10): : 1672 - 1680
  • [48] Research hotspot and trend analysis in the diagnosis of inflammatory bowel disease: A machine learning bibliometric analysis from 2012 to 2021
    Liu, Chuan
    Yu, Rong
    Zhang, Jixiang
    Wei, Shuchun
    Xue, Fumin
    Guo, Yingyun
    He, Pengzhan
    Shang, Lining
    Dong, Weiguo
    FRONTIERS IN IMMUNOLOGY, 2022, 13
  • [49] Stacking and Chaining of Normalization Methods in Deep Learning-Based Classification of Colorectal Cancer Using Gut Microbiome Data
    Mulenga, Mwenge
    Kareem, Sameem Abdul
    Sabri, Aznul Qalid Md
    Seera, Manjeevan
    Mulenga, Mwenge (mwenge2008@yahoo.co.uk), 1600, Institute of Electrical and Electronics Engineers Inc. (09): : 97296 - 97319
  • [50] Stacking and Chaining of Normalization Methods in Deep Learning-Based Classification of Colorectal Cancer Using Gut Microbiome Data
    Mulenga, Mwenge
    Kareem, Sameem Abdul
    Sabri, Aznul Qalid Md
    Seera, Manjeevan
    IEEE ACCESS, 2021, 9 : 97296 - 97319