A Fast Machine Learning Workflow for Rapid Phenotype Prediction from Whole Shotgun Metagenomes

被引:0
|
作者
Carrieri, Anna Paola [1 ]
Rowe, Will P. M. [2 ]
Winn, Martyn [2 ]
Pyzer-Knapp, Edward O. [1 ]
机构
[1] IBM Res UK, Sci Tech Daresbury, Warrington, Cheshire, England
[2] STFC Daresbury Lab, Sci Comp Dept, Warrington, Cheshire, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Research on the microbiome is an emerging and crucial science that finds many applications in healthcare, food safety, precision agriculture and environmental studies. Huge amounts of DNA from microbial communities are being sequenced and analyzed by scientists interested in extracting meaningful biological information from this big data. Analyzing massive microbiome sequencing datasets, which embed the functions and interactions of thousands of different bacterial, fungal and viral species, is a significant computational challenge. Artificial intelligence has the potential for building predictive models that can provide insights for specific cutting edge applications such as guiding diagnostics and developing personalised treatments, as well as maintaining soil health and fertility. Current machine learning workflows that predict traits of host organisms from their commensal microbiome do not take into account the whole genetic material constituting the microbiome, instead basing the analysis on specific marker genes. In this paper, to the best of our knowledge, we introduce the first machine learning workflow that efficiently performs host phenotype prediction from whole shotgun metagenomes by computing similarity-preserving compact representations of the genetic material. Our workflow enables prediction tasks, such as classification and regression, from Terabytes of raw sequencing data that do not necessitate any pre-prossessing through expensive bioinformatics pipelines. We compare the performance in terms of time, accuracy and uncertainty of predictions for four different classifiers. More precisely, we demonstrate that our ML workflow can efficiently classify real data with high accuracy, using examples from dog and human metagenomic studies, representing a step forward towards real time diagnostics and a potential for cloud applications.
引用
收藏
页码:9434 / 9439
页数:6
相关论文
共 50 条
  • [31] Fast Prediction of Process Variation Band through Machine Learning Models
    Kareem, Pervaiz
    Kwon, Yonghwi
    Cho, Gangmin
    Shin, Youngsoo
    [J]. OPTICAL MICROLITHOGRAPHY XXXIV, 2021, 11613
  • [32] Machine learning model for fast prediction of the natural frequencies of protein molecules
    Qin, Zhao
    Yu, Qingyi
    Buehler, Markus J.
    [J]. RSC ADVANCES, 2020, 10 (28) : 16607 - 16615
  • [33] Fast and Efficient Cross Band Channel Prediction Using Machine Learning
    Bakshi, Arjun
    Mao, Yifan
    Srinivasan, Kannan
    Parthasarathy, Srinivasan
    [J]. MOBICOM'19: PROCEEDINGS OF THE 25TH ANNUAL INTERNATIONAL CONFERENCE ON MOBILE COMPUTING AND NETWORKING, 2019,
  • [34] SolTranNet-A Machine Learning Tool for Fast Aqueous Solubility Prediction
    Francoeur, Paul G.
    Koes, David R.
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2021, 61 (06) : 2530 - 2536
  • [35] Rapid seismic response prediction of rocking blocks using machine learning
    Zeinep Achmet
    Spyridon Diamantopoulos
    Michalis Fragiadakis
    [J]. Bulletin of Earthquake Engineering, 2024, 22 : 3471 - 3489
  • [36] Rapid seismic response prediction of rocking blocks using machine learning
    Achmet, Zeinep
    Diamantopoulos, Spyridon
    Fragiadakis, Michalis
    [J]. BULLETIN OF EARTHQUAKE ENGINEERING, 2024, 22 (07) : 3471 - 3489
  • [37] Development and validation of a machine learning model integrated with the clinical workflow for inpatient discharge date prediction
    Mahyoub, Mohammed A.
    Dougherty, Kacie
    Yadav, Ravi R.
    Berio-Dorta, Raul
    Shukla, Ajit
    [J]. FRONTIERS IN DIGITAL HEALTH, 2024, 6
  • [38] iProbiotics: a machine learning platform for rapid identification of probiotic properties from whole-genome primary sequences
    Sun, Yu
    Li, Haicheng
    Zheng, Lei
    Li, Jinzhao
    Hong, Yan
    Liang, Pengfei
    Kwok, Lai-Yu
    Zuo, Yongchun
    Zhang, Wenyi
    Zhang, Heping
    [J]. BRIEFINGS IN BIOINFORMATICS, 2022, 23 (01)
  • [39] Prediction of antimicrobial resistance based on whole-genome sequencing and machine learning
    Ren, Yunxiao
    Chakraborty, Trinad
    Doijad, Swapnil
    Falgenhauer, Linda
    Falgenhauer, Jane
    Goesmann, Alexander
    Hauschild, Anne-Christin
    Schwengers, Oliver
    Heider, Dominik
    [J]. BIOINFORMATICS, 2022, 38 (02) : 325 - 334
  • [40] Influenza virus genotype to phenotype predictions through machine learning: a systematic review Computational Prediction of Influenza Phenotype
    Borkenhagen, Laura K.
    Allen, Martin W.
    Runstadler, Jonathan A.
    [J]. EMERGING MICROBES & INFECTIONS, 2021, 10 (01) : 1896 - 1907