Lessons Learnt From Using the Machine Learning Random Forest Algorithm to Predict Virulence in Streptococcus pyogenes

被引:3
|
作者
Buckley, Sean J. [1 ]
Harvey, Robert J. [1 ,2 ]
机构
[1] Univ Sunshine Coast, Sch Hlth & Behav Sci, Maroochydore, Qld, Australia
[2] Sunshine Coast Hlth Inst, Birtinya, Qld, Australia
关键词
Streptococcus pyogenes; machine learning; random forest; virulence; phenotype metadata; PLASMINOGEN;
D O I
10.3389/fcimb.2021.809560
中图分类号
R392 [医学免疫学]; Q939.91 [免疫学];
学科分类号
100102 ;
摘要
Group A Streptococcus is a globally significant human pathogen. The extensive variability of the GAS genome, virulence phenotypes and clinical outcomes, render it an excellent candidate for the application of genotype-phenotype association studies in the era of whole-genome sequencing. We have catalogued the distribution and diversity of the transcription regulators of GAS, and employed phylogenetics, concordance metrics and machine learning (ML) to test for associations. In this review, we communicate the lessons learnt in the context of the recent bacteria genotype-phenotype association studies of others that have utilised both genome-wide association studies (GWAS) and ML. We envisage a promising future for the application GWAS in bacteria genotype-phenotype association studies and foresee the increasing use of ML. However, progress in this field is hindered by several outstanding bottlenecks. These include the shortcomings that are observed when GWAS techniques that have been fine-tuned on human genomes, are applied to bacterial genomes. Furthermore, there is a deficit of easy-to-use end-to-end workflows, and a lag in the collection of detailed phenotype and clinical genomic metadata. We propose a novel quality control protocol for the collection of high-quality GAS virulence phenotype coupled to clinical outcome data. Finally, we incorporate this protocol into a workflow for testing genotype-phenotype associations using ML and 'linked' patient-microbe genome sets that better represent the infection event.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Application of the random forest algorithm to Streptococcus pyogenes response regulator allele variation: from machine learning to evolutionary models
    Sean J. Buckley
    Robert J. Harvey
    Zack Shan
    Scientific Reports, 11
  • [2] Application of the random forest algorithm to Streptococcus pyogenes response regulator allele variation: from machine learning to evolutionary models
    Buckley, Sean J.
    Harvey, Robert J.
    Shan, Zack
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [3] A machine learning approach using random forest and LASSO to predict wine quality
    Athanasiadis, Ioannis
    Ioannides, Dimitrios
    INTERNATIONAL JOURNAL OF SUSTAINABLE AGRICULTURAL MANAGEMENT AND INFORMATICS, 2021, 7 (03) : 232 - 251
  • [4] Land subsidence susceptibility assessment using random forest machine learning algorithm
    Majid Mohammady
    Hamid Reza Pourghasemi
    Mojtaba Amiri
    Environmental Earth Sciences, 2019, 78
  • [5] Land subsidence susceptibility assessment using random forest machine learning algorithm
    Mohammady, Majid
    Pourghasemi, Hamid Reza
    Amiri, Mojtaba
    ENVIRONMENTAL EARTH SCIENCES, 2019, 78 (16)
  • [6] Prediction of ameloblastoma recurrence using random forest-a machine learning algorithm
    Wang, R.
    Li, K. Y.
    Su, Y-X
    INTERNATIONAL JOURNAL OF ORAL AND MAXILLOFACIAL SURGERY, 2022, 51 (07) : 886 - 891
  • [7] Using Random Forest Algorithm to Predict β-Hairpin Motifs
    Jia, Shao-Chun
    Hu, Xiu-Zhen
    PROTEIN AND PEPTIDE LETTERS, 2011, 18 (06): : 609 - 617
  • [8] Molecular Hessian matrices from a machine learning random forest regression algorithm
    Domenichini, Giorgio
    Dellago, Christoph
    JOURNAL OF CHEMICAL PHYSICS, 2023, 159 (19):
  • [9] Mapping the landscape of heart failure - lessons learnt using machine learning
    Jasinska-Piadlo, A.
    Bond, R.
    Biglarbeigi, P.
    Mceneaney, D.
    Donnelly, E.
    Patton, C.
    Ross, C.
    Finley, D.
    Campbell, P.
    HEART, 2023, 109 (SUPPL_6) : A31 - A33
  • [10] Lessons learnt from machine learning in early stages of drug discovery
    Cavasotto, Claudio N.
    Di Filippo, Juan I.
    Scardino, Valeria
    EXPERT OPINION ON DRUG DISCOVERY, 2024, 19 (06) : 631 - 633