Lessons Learnt From Using the Machine Learning Random Forest Algorithm to Predict Virulence in Streptococcus pyogenes

被引:3
|
作者
Buckley, Sean J. [1 ]
Harvey, Robert J. [1 ,2 ]
机构
[1] Univ Sunshine Coast, Sch Hlth & Behav Sci, Maroochydore, Qld, Australia
[2] Sunshine Coast Hlth Inst, Birtinya, Qld, Australia
关键词
Streptococcus pyogenes; machine learning; random forest; virulence; phenotype metadata; PLASMINOGEN;
D O I
10.3389/fcimb.2021.809560
中图分类号
R392 [医学免疫学]; Q939.91 [免疫学];
学科分类号
100102 ;
摘要
Group A Streptococcus is a globally significant human pathogen. The extensive variability of the GAS genome, virulence phenotypes and clinical outcomes, render it an excellent candidate for the application of genotype-phenotype association studies in the era of whole-genome sequencing. We have catalogued the distribution and diversity of the transcription regulators of GAS, and employed phylogenetics, concordance metrics and machine learning (ML) to test for associations. In this review, we communicate the lessons learnt in the context of the recent bacteria genotype-phenotype association studies of others that have utilised both genome-wide association studies (GWAS) and ML. We envisage a promising future for the application GWAS in bacteria genotype-phenotype association studies and foresee the increasing use of ML. However, progress in this field is hindered by several outstanding bottlenecks. These include the shortcomings that are observed when GWAS techniques that have been fine-tuned on human genomes, are applied to bacterial genomes. Furthermore, there is a deficit of easy-to-use end-to-end workflows, and a lag in the collection of detailed phenotype and clinical genomic metadata. We propose a novel quality control protocol for the collection of high-quality GAS virulence phenotype coupled to clinical outcome data. Finally, we incorporate this protocol into a workflow for testing genotype-phenotype associations using ML and 'linked' patient-microbe genome sets that better represent the infection event.
引用
收藏
页数:7
相关论文
共 50 条
  • [31] USING A MACHINE LEARNING ALGORITHM TO PREDICT PROSTATE CANCER GRADE
    De Nunzio, Cosimo
    Cindolo, Luca
    Sarchi, Luca
    Iseppi, Andrea
    Rizzo, Mino
    Riccardo, Bertolo
    Minervini, Andrea
    Sessa, Francesco
    Muto, Gianluca
    Bove, Pierluigi
    Vittori, Matteo
    Bozzini, Giorgio
    Arsizio, Busto
    Castellan, Pietro
    Mugavero, Filippo
    Panfilo, Daniele
    Saccani, Sebastiano
    Falsaperla, Mario
    Schips, Luigi
    Celia, Antonio
    Bada, Maida
    del Grappa, Bassano
    Porreca, Angelo
    Pastore, Antonio
    Yazan, Al Salhi
    Marco, Giampaoli
    Novella, Giovanni
    Rizzetto, Riccardo
    Trabacchin, Nicolo
    Guglielmo, Mantica
    Pini, Giovannalberto
    Lombardo, Riccardo
    Rocco, Bernardo
    Antonelli, Alessandro
    Tubaro, Andrea
    JOURNAL OF UROLOGY, 2020, 203 : E1236 - E1236
  • [32] Using random forest algorithm to predict super-secondary structure in proteins
    Xiu-zhen Hu
    Hai-xia Long
    Chang-jiang Ding
    Su-juan Gao
    Rui Hou
    The Journal of Supercomputing, 2020, 76 : 3199 - 3210
  • [33] Using random forest algorithm to predict super-secondary structure in proteins
    Hu, Xiu-zhen
    Long, Hai-xia
    Ding, Chang-jiang
    Gao, Su-juan
    Hou, Rui
    JOURNAL OF SUPERCOMPUTING, 2020, 76 (05): : 3199 - 3210
  • [34] Using ensemble machine learning algorithm to predict forest fire occurrence probability in Madhya Pradesh and Chhattisgarh, India
    Singh, Sumedha Surbhi
    Jeganathan, C.
    ADVANCES IN SPACE RESEARCH, 2024, 73 (06) : 2969 - 2987
  • [35] Leptospirosis modelling using hydrometeorological indices and random forest machine learning
    Jayaramu, Veianthan
    Zulkafli, Zed
    De Stercke, Simon
    Buytaert, Wouter
    Rahmat, Fariq
    Rahman, Ribhan Zafira Abdul
    Ishak, Asnor Juraiza
    Tahir, Wardah
    Ab Rahman, Jamalludin
    Fuzi, Nik Mohd Hafiz Mohd
    INTERNATIONAL JOURNAL OF BIOMETEOROLOGY, 2023, 67 (03) : 423 - 437
  • [36] Classification of Phishing Email Using Random Forest Machine Learning Technique
    Akinyelu, Andronicus A.
    Adewumi, Aderemi O.
    JOURNAL OF APPLIED MATHEMATICS, 2014,
  • [37] House Price Prediction using Random Forest Machine Learning Technique
    Adetunji, Abigail Bola
    Akande, Oluwatobi Noah
    Ajala, Funmilola Alaba
    Oyewo, Ololade
    Akande, Yetunde Faith
    Oluwadara, Gbenle
    8TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT (ITQM 2020 & 2021): DEVELOPING GLOBAL DIGITAL ECONOMY AFTER COVID-19, 2022, 199 : 806 - 813
  • [38] Leptospirosis modelling using hydrometeorological indices and random forest machine learning
    Veianthan Jayaramu
    Zed Zulkafli
    Simon De Stercke
    Wouter Buytaert
    Fariq Rahmat
    Ribhan Zafira Abdul Rahman
    Asnor Juraiza Ishak
    Wardah Tahir
    Jamalludin Ab Rahman
    Nik Mohd Hafiz Mohd Fuzi
    International Journal of Biometeorology, 2023, 67 : 423 - 437
  • [39] Supervised Machine Learning Model to Predict the Bank Loan Application Using Binary Classification, Decision Tree and Random Forest
    Gnanasekar, A.
    Rani, P. Shobha
    Akash, S.
    Arjunan, S.
    Devananth, A.
    INTERNATIONAL JOURNAL OF EARLY CHILDHOOD SPECIAL EDUCATION, 2022, 14 (02) : 4510 - 4518
  • [40] Using Random Forest, a machine learning approach to predict nitrogen, phosphorus, and sediment event mean concentrations in urban runoff
    Behrouz, Mina Shahed
    Yazdi, Mohammad Nayeb
    Sample, David J.
    JOURNAL OF ENVIRONMENTAL MANAGEMENT, 2022, 317