Metaviromic identification of discriminative genomic features in SARS-CoV-2 using machine learning

被引:2
|
作者
Park, Jonathan J. [1 ,2 ,3 ,4 ]
Chen, Sidi [1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10 ,11 ,12 ,13 ]
机构
[1] Yale Univ, Dept Genet, Sch Med, New Haven, CT 06520 USA
[2] Yale Univ, Syst Biol Inst, New Haven, CT 06520 USA
[3] Yale Univ, Ctr Canc Syst Biol, New Haven, CT 06520 USA
[4] Yale Univ, MD PhD Program, New Haven, CT 06520 USA
[5] Yale Univ, Immunobiol Program, New Haven, CT 06520 USA
[6] Yale Univ, Mol Cell Biol Genet & Dev Program, New Haven, CT 06520 USA
[7] Yale Univ, Combined Program Biol & Biomed Sci, New Haven, CT 06520 USA
[8] Yale Univ, Dept Neurosurg, Sch Med, New Haven, CT 06520 USA
[9] Yale Univ, Ctr Comprehens Canc, Sch Med, New Haven, CT 06520 USA
[10] Yale Univ, Stem Cell Ctr, Sch Med, New Haven, CT 06520 USA
[11] Yale Univ, Ctr Liver, Sch Med, New Haven, CT 06520 USA
[12] Yale Univ, Ctr Biomed Data Sci, Sch Med, New Haven, CT 06520 USA
[13] Yale Univ, Ctr RNA Sci & Med, Sch Med, New Haven, CT 06520 USA
来源
PATTERNS | 2022年 / 3卷 / 02期
关键词
MULTIPLE SEQUENCE ALIGNMENT; CORONAVIRUS; SPIKE;
D O I
10.1016/j.patter.2021.100407
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The COVID-19 pandemic caused by SARS-CoV-2 has become a major threat across the globe. Here, we developed machine learning approaches to identify key pathogenic regions in coronavirus genomes. We trained and evaluated 7,562,625 models on 3,665 genomes including SARS-CoV-2, MERS-CoV, SARSCoV, and other coronaviruses of human and animal origins to return quantitative and biologically interpretable signatures at nucleotide and amino acid resolutions. We identified hotspots across the SARS-CoV-2 genome, including previously unappreciated features in spike, RdRp, and other proteins. Finally, we integrated pathogenicity genomic profiles with B cell and T cell epitope predictions for enrichment of sequence targets to help guide vaccine development. These results provide a systematic map of predicted pathogenicity in SARS-CoV-2 that incorporates sequence, structural, and immunologic features, providing an unbiased collection of genetic elements for functional studies. This metavirome-based framework can also be applied for rapid characterization of new coronavirus strains or emerging pathogenic viruses.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Machine learning algorithms for identification of SARS-CoV-2 in neonates
    Dobrijevic, D.
    Katanic, J.
    Pastor, K.
    CLINICA CHIMICA ACTA, 2024, 558
  • [2] Identification of Genomic Variants of SARS-CoV-2 Using Nanopore Sequencing
    Capraru, Ionut Dragos
    Romanescu, Mirabela
    Anghel, Flavia Medana
    Oancea, Cristian
    Marian, Catalin
    Sirbu, Ioan Ovidiu
    Chis, Aimee Rodica
    Ciordas, Paula Diana
    MEDICINA-LITHUANIA, 2022, 58 (12):
  • [3] Classification of SARS-CoV-2 and non-SARS-CoV-2 using machine learning algorithms
    Singh, Om Prakash
    Vallejo, Marta
    El-Badawy, Ismail M.
    Aysha, Ali
    Madhanagopal, Jagannathan
    Faudzi, Ahmad Athif Mohd
    COMPUTERS IN BIOLOGY AND MEDICINE, 2021, 136
  • [4] Feature selection for effective prediction of SARS-COV-2 using machine learning
    Punacha, Gagan
    Adiga, Rama
    GENES & GENOMICS, 2024, 46 (01) : 95 - 112
  • [5] Prediction of antigenic peptides of SARS-CoV-2 pathogen using machine learning
    Bukhari, Syed Nisar Hussain
    Ogudo, Kingsley A.
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [7] Feature selection for effective prediction of SARS-COV-2 using machine learning
    Gagan Punacha
    Rama Adiga
    Genes & Genomics, 2024, 46 : 341 - 354
  • [8] Identification of SARS-CoV-2 Main Protease Inhibitors Using Chemical Similarity Analysis Combined with Machine Learning
    Juarez-Mercado, Karina Euridice
    Gomez-Hernandez, Milton Abraham
    Salinas-Trujano, Juana
    Cordova-Bahena, Luis
    Espitia, Clara
    Perez-Tapia, Sonia Mayra
    Medina-Franco, Jose L.
    Velasco-Velazquez, Marco A.
    PHARMACEUTICALS, 2024, 17 (02)
  • [9] Subgenomic RNA identification in SARS-CoV-2 genomic sequencing data
    Parker, Matthew D.
    Lindsey, Benjamin B.
    Leary, Shay
    Gaudieri, Silvana
    Chopra, Abha
    Wyles, Matthew
    Angyal, Adrienn
    Green, Luke R.
    Parsons, Paul
    Tucker, Rachel M.
    Brown, Rebecca
    Groves, Danielle
    Johnson, Katie
    Carrilero, Laura
    Heffer, Joe
    Patridge, David G.
    Evans, Cariad
    Raza, Mohammad
    Keeley, Alexander J.
    Smith, Nikki
    Filipe, Ana Da Silva
    Shepherd, James G.
    Davis, Chris
    Bennett, Sahan
    Sreenu, Vattipally B.
    Kohl, Alain
    Aranday-Cortes, Elihu
    Tong, Lily
    Nichols, Jenna
    Thomson, Emma C.
    Wang, Dennis
    Mallal, Simon
    de Silva, Thushan I.
    GENOME RESEARCH, 2021, 31 (04) : 645 - 658
  • [10] Machine Learning Models Identify Inhibitors of SARS-CoV-2
    Gawriljuk, Victor O.
    Zin, Phyo Phyo Kyaw
    Puhl, Ana C.
    Zorn, Kimberley M.
    Foil, Daniel H.
    Lane, Thomas R.
    Hurst, Brett
    Tavella, Tatyana Almeida
    Maranhao Costa, Fabio Trindade
    Lakshmanane, Premkumar
    Bernatchez, Jean
    Godoy, Andre S.
    Oliva, Glaucius
    Siqueira-Neto, Jair L.
    Madrid, Peter B.
    Ekins, Sean
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2021, 61 (09) : 4224 - 4235