A Fast Machine Learning Workflow for Rapid Phenotype Prediction from Whole Shotgun Metagenomes

被引:0
|
作者
Carrieri, Anna Paola [1 ]
Rowe, Will P. M. [2 ]
Winn, Martyn [2 ]
Pyzer-Knapp, Edward O. [1 ]
机构
[1] IBM Res UK, Sci Tech Daresbury, Warrington, Cheshire, England
[2] STFC Daresbury Lab, Sci Comp Dept, Warrington, Cheshire, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Research on the microbiome is an emerging and crucial science that finds many applications in healthcare, food safety, precision agriculture and environmental studies. Huge amounts of DNA from microbial communities are being sequenced and analyzed by scientists interested in extracting meaningful biological information from this big data. Analyzing massive microbiome sequencing datasets, which embed the functions and interactions of thousands of different bacterial, fungal and viral species, is a significant computational challenge. Artificial intelligence has the potential for building predictive models that can provide insights for specific cutting edge applications such as guiding diagnostics and developing personalised treatments, as well as maintaining soil health and fertility. Current machine learning workflows that predict traits of host organisms from their commensal microbiome do not take into account the whole genetic material constituting the microbiome, instead basing the analysis on specific marker genes. In this paper, to the best of our knowledge, we introduce the first machine learning workflow that efficiently performs host phenotype prediction from whole shotgun metagenomes by computing similarity-preserving compact representations of the genetic material. Our workflow enables prediction tasks, such as classification and regression, from Terabytes of raw sequencing data that do not necessitate any pre-prossessing through expensive bioinformatics pipelines. We compare the performance in terms of time, accuracy and uncertainty of predictions for four different classifiers. More precisely, we demonstrate that our ML workflow can efficiently classify real data with high accuracy, using examples from dog and human metagenomic studies, representing a step forward towards real time diagnostics and a potential for cloud applications.
引用
收藏
页码:9434 / 9439
页数:6
相关论文
共 50 条
  • [1] Whole community shotgun metagenomes of two biological soil crust types from the Mojave Desert
    Nguyen, Thuy M.
    Pombubpa, Nuttapon
    Huntemann, Marcel
    Clum, Alicia
    Foster, Brian
    Foster, Bryce
    Roux, Simon
    Palaniappan, Krishnaveni
    Varghese, Neha
    Mukherjee, Supratim
    Reddy, T. B. K.
    Daum, Chris
    Copeland, Alex
    Chen, I-Min A.
    Ivanova, Natalia N.
    Kyrpides, Nikos C.
    Harmon-Smith, Miranda
    Eloe-Fadrosh, Emiley A.
    Pietrasiak, Nicole
    Stajich, Jason E.
    Hom, Erik F. Y.
    [J]. MICROBIOLOGY RESOURCE ANNOUNCEMENTS, 2024, 13 (03):
  • [2] A Machine Learning Approach for Modular Workflow Performance Prediction
    Singh, Alok
    Rao, Arvind
    Purawat, Shweta
    Altintas, Ilkay
    [J]. PROCEEDINGS OF WORKS 2017: 12TH WORKSHOP ON WORKFLOWS IN SUPPORT OF LARGE-SCALE SCIENCE, 2017,
  • [3] A machine learning pipeline for quantitative phenotype prediction from genotype data
    Giorgio Guzzetta
    Giuseppe Jurman
    Cesare Furlanello
    [J]. BMC Bioinformatics, 11
  • [4] A machine learning pipeline for quantitative phenotype prediction from genotype data
    Guzzetta, Giorgio
    Jurman, Giuseppe
    Furlanello, Cesare
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [5] PigLeg: prediction of swine phenotype using machine learning
    Bakoev, Siroj
    Getmantseva, Lyubov
    Kolosova, Maria
    Kostyunina, Olga
    Chartier, Duane R.
    Tatarinova, Tatiana, V
    [J]. PEERJ, 2020, 8
  • [6] Plant Genotype to Phenotype Prediction Using Machine Learning
    Danilevicz, Monica F.
    Gill, Mitchell
    Anderson, Robyn
    Batley, Jacqueline
    Bennamoun, Mohammed
    Bayer, Philipp E.
    Edwards, David
    [J]. FRONTIERS IN GENETICS, 2022, 13
  • [7] Machine learning-based colistin resistance marker screening and phenotype prediction in Escherichia coli from whole genome sequencing data
    Tian, Yingxin
    Zhang, Di
    Chen, Fangyuan
    Rao, Guanhua
    Zhang, Ying
    [J]. JOURNAL OF INFECTION, 2024, 88 (02) : 191 - 193
  • [8] Machine learning and feature extraction for rapid antimicrobial resistance prediction of Acinetobacter baumannii from whole-genome sequencing data
    Gao, Yue
    Li, Henan
    Zhao, Chunjiang
    Li, Shuguang
    Yin, Guankun
    Wang, Hui
    [J]. FRONTIERS IN MICROBIOLOGY, 2024, 14
  • [9] A DATA-DRIVEN WORKFLOW FOR PREDICTION OF FRACTURING PARAMETERS WITH MACHINE LEARNING
    Zhu, Zhihua
    Hsu, Maoya
    Kun, Ding
    Wang, Tianyu
    He, Xiaodong
    Tian, Shouceng
    [J]. THERMAL SCIENCE, 2024, 28 (02): : 1085 - 1090
  • [10] A DATA-DRIVEN WORKFLOW FOR PREDICTION OF FRACTURING PARAMETERS WITH MACHINE LEARNING
    Zhu, Zhihua
    Hsu, Maoya
    Kun, Ding
    Wang, Tianyu
    He, Xiaodong
    Tian, Shouceng
    [J]. THERMAL SCIENCE, 2024, 28 (2A): : 1085 - 1090