Microbiome Preprocessing Machine Learning Pipeline

被引:11
|
作者
Jasner, Yoel Y. [1 ]
Belogolovski, Anna [1 ]
Ben-Itzhak, Meirav [1 ]
Koren, Omry [2 ]
Louzoun, Yoram [1 ]
机构
[1] Bar Ilan Univ, Dept Math, Ramat Gan, Israel
[2] Bar Ilan Univ, Azrieli Fac Med, Ramat Gan, Israel
来源
FRONTIERS IN IMMUNOLOGY | 2021年 / 12卷
关键词
pipeline; machine learning; 16S; OTU; ASV; feature selection;
D O I
10.3389/fimmu.2021.677870
中图分类号
R392 [医学免疫学]; Q939.91 [免疫学];
学科分类号
100102 ;
摘要
Background 16S sequencing results are often used for Machine Learning (ML) tasks. 16S gene sequences are represented as feature counts, which are associated with taxonomic representation. Raw feature counts may not be the optimal representation for ML. Methods We checked multiple preprocessing steps and tested the optimal combination for 16S sequencing-based classification tasks. We computed the contribution of each step to the accuracy as measured by the Area Under Curve (AUC) of the classification. Results We show that the log of the feature counts is much more informative than the relative counts. We further show that merging features associated with the same taxonomy at a given level, through a dimension reduction step for each group of bacteria improves the AUC. Finally, we show that z-scoring has a very limited effect on the results. Conclusions The prepossessing of microbiome 16S data is crucial for optimal microbiome based Machine Learning. These preprocessing steps are integrated into the MIPMLP - Microbiome Preprocessing Machine Learning Pipeline, which is available as a stand-alone version at: https://github.com/louzounlab/microbiome/tree/master/Preprocess or as a service at http://mip-mlp.math.biu.ac.il/Home Both contain the code, and standard test sets.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] A Topological Machine Learning Pipeline for Classification
    Conti, Francesco
    Moroni, Davide
    Pascali, Maria Antonietta
    MATHEMATICS, 2022, 10 (17)
  • [22] Data preprocessing impact on machine learning algorithm performance
    Amato, Alberto
    Di Lecce, Vincenzo
    OPEN COMPUTER SCIENCE, 2023, 13 (01)
  • [23] Machine learning and deep learning applications in microbiome research
    Medina, Ricardo Hernandez
    Kutuzova, Svetlana
    Nielsen, Knud Nor
    Johansen, Joachim
    Hansen, Lars Hestbjerg
    Nielsen, Mads
    Rasmussen, Simon
    ISME COMMUNICATIONS, 2022, 2 (01):
  • [24] DeepPrep: an accelerated, scalable and robust pipeline for neuroimaging preprocessing empowered by deep learning
    Jianxun Ren
    Ning An
    Cong Lin
    Youjia Zhang
    Zhenyu Sun
    Wei Zhang
    Shiyi Li
    Ning Guo
    Weigang Cui
    Qingyu Hu
    Weiwei Wang
    Xuehai Wu
    Yinyan Wang
    Tao Jiang
    Theodore D. Satterthwaite
    Danhong Wang
    Hesheng Liu
    Nature Methods, 2025, 22 (3) : 473 - 476
  • [25] Improvization of Arrhythmia Detection Using Machine Learning and Preprocessing Techniques
    Babbar, Sarthak
    Kulshrestha, Sudhanshu
    Shangle, Kartik
    Dewan, Navroz
    Kesarwani, Saommya
    APPLICATIONS OF ARTIFICIAL INTELLIGENCE TECHNIQUES IN ENGINEERING, VOL 2, 2019, 697 : 537 - 550
  • [26] From machine learning to knowledge discovery: Survey of preprocessing and postprocessing
    Bruha, Ivan
    Intelligent Data Analysis, 2000, 4 (3-4) : 363 - 374
  • [27] Improved Preprocessing for Machine Learning Intrusion Detection in IEEE 802.11
    Skrak, Peter
    Lehoczky, Peter
    Bencel, Rastislav
    Galinski, Marek
    Kotuliak, Ivan
    PROCEEDINGS OF THE 2022 14TH IFIP WIRELESS AND MOBILE NETWORKING CONFERENCE (WMNC 2022), 2022, : 118 - 122
  • [28] Optimized preprocessing and machine learning for quantitative Raman spectroscopy in biology
    Storey, Emily E.
    Helmy, Amr S.
    JOURNAL OF RAMAN SPECTROSCOPY, 2019, 50 (07) : 958 - 968
  • [29] ILIOU machine learning preprocessing method for depression type prediction
    Theodoros Iliou
    Georgia Konstantopoulou
    Mandani Ntekouli
    Christina Lymperopoulou
    Konstantinos Assimakopoulos
    Dimitrios Galiatsatos
    George Anastassopoulos
    Evolving Systems, 2019, 10 : 29 - 39
  • [30] Detecting Spam Tweets Using Machine Learning and Effective Preprocessing
    Kardas, Berk
    Bayar, Ismail Erdem
    Ozyer, Tansel
    Alhajj, Reda
    PROCEEDINGS OF THE 2021 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2021, 2021, : 393 - 398