Microbiome Preprocessing Machine Learning Pipeline

被引:11
|
作者
Jasner, Yoel Y. [1 ]
Belogolovski, Anna [1 ]
Ben-Itzhak, Meirav [1 ]
Koren, Omry [2 ]
Louzoun, Yoram [1 ]
机构
[1] Bar Ilan Univ, Dept Math, Ramat Gan, Israel
[2] Bar Ilan Univ, Azrieli Fac Med, Ramat Gan, Israel
来源
FRONTIERS IN IMMUNOLOGY | 2021年 / 12卷
关键词
pipeline; machine learning; 16S; OTU; ASV; feature selection;
D O I
10.3389/fimmu.2021.677870
中图分类号
R392 [医学免疫学]; Q939.91 [免疫学];
学科分类号
100102 ;
摘要
Background 16S sequencing results are often used for Machine Learning (ML) tasks. 16S gene sequences are represented as feature counts, which are associated with taxonomic representation. Raw feature counts may not be the optimal representation for ML. Methods We checked multiple preprocessing steps and tested the optimal combination for 16S sequencing-based classification tasks. We computed the contribution of each step to the accuracy as measured by the Area Under Curve (AUC) of the classification. Results We show that the log of the feature counts is much more informative than the relative counts. We further show that merging features associated with the same taxonomy at a given level, through a dimension reduction step for each group of bacteria improves the AUC. Finally, we show that z-scoring has a very limited effect on the results. Conclusions The prepossessing of microbiome 16S data is crucial for optimal microbiome based Machine Learning. These preprocessing steps are integrated into the MIPMLP - Microbiome Preprocessing Machine Learning Pipeline, which is available as a stand-alone version at: https://github.com/louzounlab/microbiome/tree/master/Preprocess or as a service at http://mip-mlp.math.biu.ac.il/Home Both contain the code, and standard test sets.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Automated machine learning pipeline for geochemical analysis
    Alferez, German H.
    Esteban, Oscar A.
    Clausen, Benjamin L.
    Ardila, Ana Maria Martinez
    EARTH SCIENCE INFORMATICS, 2022, 15 (03) : 1683 - 1698
  • [42] MACHINE LEARNING FOR SUBSEA PIPELINE REELING MECHANICS
    Giry, Eric
    Cocault-Duverger, Vincent
    Pauthenet, Martin
    Chec, Laurent
    PROCEEDINGS OF THE ASME 39TH INTERNATIONAL CONFERENCE ON OCEAN, OFFSHORE AND ARCTIC ENGINEERING, OMAE2020, VOL 4, 2020,
  • [43] Automated machine learning pipeline for geochemical analysis
    Germán H. Alférez
    Oscar A. Esteban
    Benjamin L. Clausen
    Ana María Martínez Ardila
    Earth Science Informatics, 2022, 15 : 1683 - 1698
  • [44] Tactile sensor value preprocessing pipeline
    Ciobanu, Vlad
    Petrescu, Adrian
    Hendrich, Norman
    Zhang, Jianwei
    2013 17TH INTERNATIONAL CONFERENCE ON SYSTEM THEORY, CONTROL AND COMPUTING (ICSTCC), 2013, : 674 - 680
  • [45] Preprocessing Pipeline for fNIRS Data in Children
    Piazza, Caterina
    Bacchetta, Andrea
    Crippa, Alessandro
    Mauri, Maddalena
    Grazioli, Silvia
    Reni, Gianluigi
    Nobile, Maria
    Bianchi, Anna Maria
    XV MEDITERRANEAN CONFERENCE ON MEDICAL AND BIOLOGICAL ENGINEERING AND COMPUTING - MEDICON 2019, 2020, 76 : 235 - 244
  • [46] Enhanced Nuclei Segmentation in Histopathological Images Using a Novel Preprocessing Pipeline and Deep Learning
    Tamizifar, Ali
    Behzadifar, Pouya
    SobhaniNia, Zahra
    Karimi, Nader
    Khadivi, Pejman
    Samavi, Shadrokh
    2024 IEEE 5TH ANNUAL WORLD AI IOT CONGRESS, AIIOT 2024, 2024, : 0259 - 0264
  • [47] Omics Data Preprocessing for Machine Learning: A Case Study in Childhood Obesity
    Torres-Martos, Alvaro
    Bustos-Aibar, Mireia
    Ramirez-Mena, Alberto
    Camara-Sanchez, Sofia
    Anguita-Ruiz, Augusto
    Alcala, Rafael
    Aguilera, Concepcion M.
    Alcala-Fdez, Jesus
    GENES, 2023, 14 (02)
  • [48] Riemannian data preprocessing in machine learning to focus on QCD color structure
    Hammad, Ahmed
    Park, Myeonghun
    JOURNAL OF THE KOREAN PHYSICAL SOCIETY, 2023, 83 (04) : 235 - 242
  • [49] Riemannian data preprocessing in machine learning to focus on QCD color structure
    Ahmed Hammad
    Myeonghun Park
    Journal of the Korean Physical Society, 2023, 83 : 235 - 242
  • [50] On Evaluating Data Preprocessing Methods for Machine Learning Models for Flight Delays
    Moreira, Leonardo
    Dantas, Christofer
    Oliveira, Leonardo
    Soares, Jorge
    Ogasawara, Eduardo
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018, : 779 - 786