Lessons Learnt From Using the Machine Learning Random Forest Algorithm to Predict Virulence in Streptococcus pyogenes

被引:3
|
作者
Buckley, Sean J. [1 ]
Harvey, Robert J. [1 ,2 ]
机构
[1] Univ Sunshine Coast, Sch Hlth & Behav Sci, Maroochydore, Qld, Australia
[2] Sunshine Coast Hlth Inst, Birtinya, Qld, Australia
关键词
Streptococcus pyogenes; machine learning; random forest; virulence; phenotype metadata; PLASMINOGEN;
D O I
10.3389/fcimb.2021.809560
中图分类号
R392 [医学免疫学]; Q939.91 [免疫学];
学科分类号
100102 ;
摘要
Group A Streptococcus is a globally significant human pathogen. The extensive variability of the GAS genome, virulence phenotypes and clinical outcomes, render it an excellent candidate for the application of genotype-phenotype association studies in the era of whole-genome sequencing. We have catalogued the distribution and diversity of the transcription regulators of GAS, and employed phylogenetics, concordance metrics and machine learning (ML) to test for associations. In this review, we communicate the lessons learnt in the context of the recent bacteria genotype-phenotype association studies of others that have utilised both genome-wide association studies (GWAS) and ML. We envisage a promising future for the application GWAS in bacteria genotype-phenotype association studies and foresee the increasing use of ML. However, progress in this field is hindered by several outstanding bottlenecks. These include the shortcomings that are observed when GWAS techniques that have been fine-tuned on human genomes, are applied to bacterial genomes. Furthermore, there is a deficit of easy-to-use end-to-end workflows, and a lag in the collection of detailed phenotype and clinical genomic metadata. We propose a novel quality control protocol for the collection of high-quality GAS virulence phenotype coupled to clinical outcome data. Finally, we incorporate this protocol into a workflow for testing genotype-phenotype associations using ML and 'linked' patient-microbe genome sets that better represent the infection event.
引用
收藏
页数:7
相关论文
共 50 条
  • [21] Optoelectronic devices informatics: optimizing DSSC performance using random-forest machine learning algorithm
    Omar Al-Sabana
    Sameh O.Abdellatif
    OptoelectronicsLetters, 2022, 18 (03) : 148 - 151
  • [22] Optoelectronic devices informatics: optimizing DSSC performance using random-forest machine learning algorithm
    Al-Sabana, Omar
    Abdellatif, Sameh O.
    OPTOELECTRONICS LETTERS, 2022, 18 (03) : 148 - 151
  • [23] Overview of Machine Learning Techniques in Cybersecurity Data Science using Gradient Boosting and Random Forest Algorithm
    Gulhane, Kimsy
    Saxena, Surabhi
    Deogaonkar, Anant
    Kumar, Vinod
    Vichoray, Chandan
    Goyal, Shweta
    2024 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT CYBER PHYSICAL SYSTEMS AND INTERNET OF THINGS, ICOICI 2024, 2024, : 19 - 24
  • [24] Development of a Machine Learning Algorithm for Efficient Localization of Damage in a Composite Structure Using Random Forest Technique
    Shinagam R.K.
    Maruvada T.
    Janjeti S.
    Talari R.M.D.
    Shinagam R.V.
    Iranian Journal of Science and Technology, Transactions of Civil Engineering, 2024, 48 (6) : 4793 - 4809
  • [25] Optoelectronic devices informatics: optimizing DSSC performance using random-forest machine learning algorithm
    Omar Al-Sabana
    Sameh O. Abdellatif
    Optoelectronics Letters, 2022, 18 : 148 - 151
  • [26] Predicting hydrogen and oxygen indices (HI, OI) from conventional well logs using a Random Forest machine learning algorithm
    Gordon, John B.
    Sanei, Hamed
    Pedersen, Per K.
    INTERNATIONAL JOURNAL OF COAL GEOLOGY, 2022, 249
  • [27] A machine learning algorithm with random forest for recognizing hidden control factors from seismic fault distribution
    Jinsu Jang
    Byung-Dal So
    David A. Yuen
    Geosciences Journal, 2023, 27 : 113 - 126
  • [28] Use of random forest machine learning algorithm to predict short term outcomes following posterior cervical decompression with instrumented fusion
    Cabrera, Andrew
    Bouterse, Alexander
    Nelson, Michael
    Razzouk, Jacob
    Ramos, Omar
    Chung, David
    Cheng, Wayne
    Danisa, Olumide
    JOURNAL OF CLINICAL NEUROSCIENCE, 2023, 107 : 167 - 171
  • [29] A machine learning algorithm with random forest for recognizing hidden control factors from seismic fault distribution
    Jang, Jinsu
    So, Byung-Dal
    Yuen, David A.
    GEOSCIENCES JOURNAL, 2023, 27 (01) : 113 - 126
  • [30] Safety Assurance of Autonomous Systems using Machine Learning: An Industrial Case Study and Lessons Learnt
    Zeller, Marc
    INCOSE International Symposium, 2023, 33 (01) : 320 - 333