Benchmark of Data Processing Methods and Machine Learning Models for Gut Microbiome-Based Diagnosis of Inflammatory Bowel Disease

被引:13
|
作者
Kubinski, Ryszard [1 ]
Djamen-Kepaou, Jean-Yves [1 ]
Zhanabaev, Timur [1 ]
Hernandez-Garcia, Alex [2 ]
Bauer, Stefan [3 ]
Hildebrand, Falk [4 ,5 ]
Korcsmaros, Tamas [4 ,5 ]
Karam, Sani [1 ]
Jantchou, Prevost [6 ]
Kafi, Kamran [1 ]
Martin, Ryan D. [1 ]
机构
[1] Phyla Technol Inc, Montreal, PQ, Canada
[2] Univ Montreal, Quebec Artificial Intelligence Inst, Montreal, PQ, Canada
[3] Max Planck Inst Intelligent Syst, Tubingen, Germany
[4] Quadram Inst Biosci, Gut Microbes & Hlth, Norwich, Norfolk, England
[5] Earlham Inst, Norwich, Norfolk, England
[6] Ctr Hosp Univ St Justine, Montreal, PQ, Canada
基金
英国生物技术与生命科学研究理事会; 欧盟地平线“2020”; 欧洲研究理事会;
关键词
inflammatory bowel disease; machine learning; gut microbiome; batch effect reduction; data normalization; QIIME2; PICRUSt2; COMPOSITIONAL DATA; ULCERATIVE-COLITIS; RISK-FACTORS; DIVERSITY; DELAY; EXPRESSION; PREDICTION; THERAPY; IMPACT; SILVA;
D O I
10.3389/fgene.2022.784397
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Patients with inflammatory bowel disease (IBD) wait months and undergo numerous invasive procedures between the initial appearance of symptoms and receiving a diagnosis. In order to reduce time until diagnosis and improve patient wellbeing, machine learning algorithms capable of diagnosing IBD from the gut microbiome's composition are currently being explored. To date, these models have had limited clinical application due to decreased performance when applied to a new cohort of patient samples. Various methods have been developed to analyze microbiome data which may improve the generalizability of machine learning IBD diagnostic tests. With an abundance of methods, there is a need to benchmark the performance and generalizability of various machine learning pipelines (from data processing to training a machine learning model) for microbiome-based IBD diagnostic tools. We collected fifteen 16S rRNA microbiome datasets (7,707 samples) from North America to benchmark combinations of gut microbiome features, data normalization and transformation methods, batch effect correction methods, and machine learning models. Pipeline generalizability to new cohorts of patients was evaluated with two binary classification metrics following leave-one-dataset-out cross (LODO) validation, where all samples from one study were left out of the training set and tested upon. We demonstrate that taxonomic features processed with a compositional transformation method and batch effect correction with the naive zero-centering method attain the best classification performance. In addition, machine learning models that identify non-linear decision boundaries between labels are more generalizable than those that are linearly constrained. Lastly, we illustrate the importance of generating a curated training dataset to ensure similar performance across patient demographics. These findings will help improve the generalizability of machine learning models as we move towards non-invasive diagnostic and disease management tools for patients with IBD.
引用
下载
收藏
页数:22
相关论文
共 50 条
  • [31] Advances in Machine Learning Processing of Big Data from Disease Diagnosis Sensors
    Lu, Shasha
    Yang, Jianyu
    Gu, Yu
    He, Dongyuan
    Wu, Haocheng
    Sun, Wei
    Xu, Dong
    Li, Changming
    Guo, Chunxian
    ACS SENSORS, 2024, 9 (03) : 1134 - 1148
  • [32] Volumetric visceral fat machine learning phenotype on CT for differential diagnosis of inflammatory bowel disease
    Zhou, Ziling
    Xiong, Ziman
    Cheng, Ran
    Luo, Qingyu
    Li, Yuanqiu
    Xie, Qingguo
    Xiao, Peng
    Hu, Daoyu
    Hu, Xuemei
    Shen, Yaqi
    Li, Zhen
    EUROPEAN RADIOLOGY, 2023, 33 (03) : 1862 - 1872
  • [33] Volumetric visceral fat machine learning phenotype on CT for differential diagnosis of inflammatory bowel disease
    Ziling Zhou
    Ziman Xiong
    Ran Cheng
    Qingyu Luo
    Yuanqiu Li
    Qingguo Xie
    Peng Xiao
    Daoyu Hu
    Xuemei Hu
    Yaqi Shen
    Zhen Li
    European Radiology, 2023, 33 : 1862 - 1872
  • [34] Longitudinal gut microbiome dynamics in relation to disease flares in Inflammatory Bowel Disease, pilot data from the IBD-Tracker study
    Gacesa, R.
    Klaassen, M. A. Y.
    Collij, V.
    Bjork, J. R.
    Blankenstein, A. D.
    Jansen, B. H.
    Dijkstra, G.
    Visschedijk, M.
    Festen, E. A. M.
    Ananthakrishnan, A.
    Alm, E.
    Weersma, R. K.
    JOURNAL OF CROHNS & COLITIS, 2024, 18 : I198 - I199
  • [35] Machine learning-based solution reveals cuproptosis features in inflammatory bowel disease
    Liu, Le
    Liang, Liping
    Yang, Chenghai
    Chen, Ye
    FRONTIERS IN IMMUNOLOGY, 2023, 14
  • [36] Students’ Course Results Prediction Based on Data Processing and Machine Learning Methods
    Jinyang Liu
    Chuantao Yin
    Kunyang Wang
    Minghui Guan
    Xi Wang
    Hong Zhou
    Journal of Signal Processing Systems, 2022, 94 : 1199 - 1211
  • [37] Students' Course Results Prediction Based on Data Processing and Machine Learning Methods
    Liu, Jinyang
    Yin, Chuantao
    Wang, Kunyang
    Guan, Minghui
    Wang, Xi
    Zhou, Hong
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2022, 94 (11): : 1199 - 1211
  • [38] DEVELOPMENT AND VALIDATION OF A MACHINE LEARNING TOOL FOR EARLY DIAGNOSIS OF INFLAMMATORY BOWEL DISEASE IN THE PRIMARY CARE SETTING: A POPULATION BASED STUDY
    Ber, Tahel Ilan
    Tov, Amir Ben
    Gazit, Sivan
    Steinberg-Koch, Shlomit
    Getz, Benny
    Jenudi, Yonatan
    Underberger, Dan
    Ramni, Or
    Ben-Horin, Shomron
    INFLAMMATORY BOWEL DISEASES, 2022, 28 : S20 - S21
  • [39] DEVELOPMENT AND VALIDATION OF A MACHINE LEARNING TOOL FOR EARLY DIAGNOSIS OF INFLAMMATORY BOWEL DISEASE IN THE PRIMARY CARE SETTING: A POPULATION BASED STUDY
    Ber, Tahel Ilan
    Ben Tov, Amir
    Gazit, Sivan
    Steinberg-Koch, Shlomit
    Getz, Benny
    Jenudi, Yonatan
    Underberger, Dan
    Ramni, Or
    Ben-Horin, Shomron
    GASTROENTEROLOGY, 2022, 162 (03) : S20 - S21
  • [40] Speech processing for early Alzheimer Disease diagnosis: Machine learning based approach
    Ben Ammar, Randa
    Ben Ayed, Yassine
    2018 IEEE/ACS 15TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2018,