Metaviromic identification of discriminative genomic features in SARS-CoV-2 using machine learning

被引:2
|
作者
Park, Jonathan J. [1 ,2 ,3 ,4 ]
Chen, Sidi [1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10 ,11 ,12 ,13 ]
机构
[1] Yale Univ, Dept Genet, Sch Med, New Haven, CT 06520 USA
[2] Yale Univ, Syst Biol Inst, New Haven, CT 06520 USA
[3] Yale Univ, Ctr Canc Syst Biol, New Haven, CT 06520 USA
[4] Yale Univ, MD PhD Program, New Haven, CT 06520 USA
[5] Yale Univ, Immunobiol Program, New Haven, CT 06520 USA
[6] Yale Univ, Mol Cell Biol Genet & Dev Program, New Haven, CT 06520 USA
[7] Yale Univ, Combined Program Biol & Biomed Sci, New Haven, CT 06520 USA
[8] Yale Univ, Dept Neurosurg, Sch Med, New Haven, CT 06520 USA
[9] Yale Univ, Ctr Comprehens Canc, Sch Med, New Haven, CT 06520 USA
[10] Yale Univ, Stem Cell Ctr, Sch Med, New Haven, CT 06520 USA
[11] Yale Univ, Ctr Liver, Sch Med, New Haven, CT 06520 USA
[12] Yale Univ, Ctr Biomed Data Sci, Sch Med, New Haven, CT 06520 USA
[13] Yale Univ, Ctr RNA Sci & Med, Sch Med, New Haven, CT 06520 USA
来源
PATTERNS | 2022年 / 3卷 / 02期
关键词
MULTIPLE SEQUENCE ALIGNMENT; CORONAVIRUS; SPIKE;
D O I
10.1016/j.patter.2021.100407
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The COVID-19 pandemic caused by SARS-CoV-2 has become a major threat across the globe. Here, we developed machine learning approaches to identify key pathogenic regions in coronavirus genomes. We trained and evaluated 7,562,625 models on 3,665 genomes including SARS-CoV-2, MERS-CoV, SARSCoV, and other coronaviruses of human and animal origins to return quantitative and biologically interpretable signatures at nucleotide and amino acid resolutions. We identified hotspots across the SARS-CoV-2 genome, including previously unappreciated features in spike, RdRp, and other proteins. Finally, we integrated pathogenicity genomic profiles with B cell and T cell epitope predictions for enrichment of sequence targets to help guide vaccine development. These results provide a systematic map of predicted pathogenicity in SARS-CoV-2 that incorporates sequence, structural, and immunologic features, providing an unbiased collection of genetic elements for functional studies. This metavirome-based framework can also be applied for rapid characterization of new coronavirus strains or emerging pathogenic viruses.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Genomic Epidemiology of SARS-CoV-2 in Pakistan
    Shuhui Song
    Cuiping Li
    Lu Kang
    Dongmei Tian
    Nazish Badar
    Wentai Ma
    Shilei Zhao
    Xuan Jiang
    Chun Wang
    Yongqiao Sun
    Wenjie Li
    Meng Lei
    Shuangli Li
    Qiuhui Qi
    Aamer Ikram
    Muhammad Salman
    Massab Umair
    Huma Shireen
    Fatima Batool
    Bing Zhang
    Hua Chen
    Yun-Gui Yang
    Amir Ali Abbasi
    Mingkun Li
    Yongbiao Xue
    Yiming Bao
    Genomics,Proteomics & Bioinformatics, 2021, 19 (05) : 727 - 740
  • [22] Utilizing genomic signatures to gain insights into the dynamics of SARS-CoV-2 through Machine and Deep Learning techniques
    Elsherbini, Ahmed M. A.
    Elkholy, Amr Hassan
    Fadel, Youssef M.
    Goussarov, Gleb
    Elshal, Ahmed Mohamed
    El-Hadidi, Mohamed
    Mysara, Mohamed
    BMC BIOINFORMATICS, 2024, 25 (01)
  • [23] Utilizing genomic signatures to gain insights into the dynamics of SARS-CoV-2 through Machine and Deep Learning techniques
    Ahmed M. A. Elsherbini
    Amr Hassan Elkholy
    Youssef M. Fadel
    Gleb Goussarov
    Ahmed Mohamed Elshal
    Mohamed El-Hadidi
    Mohamed Mysara
    BMC Bioinformatics, 25
  • [24] Machine Learning to Assess Treatments for SARS-CoV-2 in Hospitalized Patients
    Lee, Christian
    Ruiz, Adrian Alexis
    Nossaman, Bobby
    Yockelson, Shaun
    Sara, John-Paul
    Otero, Tiffany
    Long, William
    Raza, Daniel
    ANESTHESIA AND ANALGESIA, 2024, 139 (05): : 150 - 151
  • [25] Identification of methylation signatures and rules for predicting the severity of SARS-CoV-2 infection with machine learning methods
    Liu, Zhiyang
    Meng, Mei
    Ding, ShiJian
    Zhou, XiaoChao
    Feng, KaiYan
    Huang, Tao
    Cai, Yu-Dong
    FRONTIERS IN MICROBIOLOGY, 2022, 13
  • [26] Identification of patient demographic, clinical, and SARS-CoV-2 genomic factors associated with severe COVID-19 using supervised machine learning: a retrospective multicenter study
    Nirmalarajah, Kuganya
    Aftanas, Patryk
    Barati, Shiva
    Chien, Emily
    Crowl, Gloria
    Faheem, Amna
    Farooqi, Lubna
    Jamal, Alainna J.
    Khan, Saman
    Kotwa, Jonathon D.
    Li, Angel X.
    Mozafarihashjin, Mohammad
    Nasir, Jalees A.
    Shigayeva, Altynay
    Yim, Winfield
    Yip, Lily
    Zhong, Xi Zoe
    Katz, Kevin
    Kozak, Robert
    Mcarthur, Andrew G.
    Daneman, Nick
    Maguire, Finlay
    Mcgeer, Allison J.
    Duvvuri, Venkata R.
    Mubareka, Samira
    BMC INFECTIOUS DISEASES, 2025, 25 (01)
  • [27] Alignment-free genome analysis of SARS-CoV-2 using Machine learning.
    Randhawa, G. S.
    Soltysiak, M. P. M.
    Roz, H. E. L.
    de Souza, C. P. E.
    Hill, K. A.
    Kari, L.
    ENVIRONMENTAL AND MOLECULAR MUTAGENESIS, 2020, 61 : 55 - 55
  • [28] Routine Laboratory Blood Tests Predict SARS-CoV-2 Infection Using Machine Learning
    Yang, He S.
    Hou, Yu
    Vasovic, Ljiljana V.
    Steel, Peter A. D.
    Chadburn, Amy
    Racine-Brzostek, Sabrina E.
    Velu, Priya
    Cushing, Melissa M.
    Loda, Massimo
    Kaushal, Rainu
    Zhao, Zhen
    Wang, Fei
    CLINICAL CHEMISTRY, 2020, 66 (11) : 1396 - 1404
  • [29] Prospects of Using Machine Learning and Diamond Nanosensing for High Sensitivity SARS-CoV-2 Diagnosis
    Qureshi, Shahzad Ahmad
    Aman, Haroon
    Schirhagl, Romana
    MAGNETOCHEMISTRY, 2023, 9 (07)
  • [30] Effect of Neurological Manifestations on SARS-CoV-2 Infection Prognosis using Machine Learning Models
    Thepmankorn, Parisorn
    Heshmati, Keyvan
    Souayah, Sami
    Shafiq, Basit
    Adam, Tarek
    Adam, Nabil
    Souayah, Nizar
    NEUROLOGY, 2021, 96 (15)