Genome-wide analysis of 10664 SARS-CoV-2 genomes to identify virus strains in 73 countries based on single nucleotide polymorphism

被引:3
|
作者
Ghosh, Nimisha [1 ]
Saha, Indrajit [2 ]
Sharma, Nikhil [3 ]
Nandi, Suman [2 ]
Plewczynski, Dariusz [4 ,5 ]
机构
[1] Siksha O Anusandhan Deemed Univ, Inst Tech Educ & Res, Dept Comp Sci & Informat Technol, Bhubaneswar, Odisha, India
[2] Natl Inst Tech Teachers Training & Res, Dept Comp Sci & Engn, Kolkata, W Bengal, India
[3] Jaypee Inst Informat Technol, Dept Elect & Commun Engn, Noida, Uttar Pradesh, India
[4] Warsaw Univ Technol, Fac Math & Informat Sci, Lab Bioinformat & Computat Genom, Warsaw, Poland
[5] Univ Warsaw, Ctr New Technol, Lab Funct & Struct Genom, Warsaw, Poland
关键词
Clustering; COVID-19; Multiple sequence alignment; Non-synonymous SNP; SARS-CoV-2; SERVER;
D O I
10.1016/j.virusres.2021.198401
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
Since the onslaught of SARS-CoV-2, the research community has been searching for a vaccine to fight against this virus. However, during this period, the virus has mutated to adapt to the different environmental conditions in the world and made the task of vaccine design more challenging. In this situation, the identification of virus strains is very much timely and important task. We have performed genome-wide analysis of 10664 SARS-CoV-2 genomes of 73 countries to identify and prepare a Single Nucleotide Polymorphism (SNP) dataset of SARS-CoV-2. Thereafter, with the use of this SNP data, the advantage of hierarchical clustering is taken care of in such a way so that Average Linkage and Complete Linkage with Jaccard and Hamming distance functions are applied separately in order to identify the virus strains as clusters present in the SNP data. In this regard, the consensus of both the clustering results are also considered while Silhouette index is used as a cluster validity index to measure the goodness of the clusters as well to determine the number of clusters or virus strains. As a result, we have identified five major clusters or virus strains present worldwide. Apart from quantitative measures, these clusters are also visualized using Visual Assessment of Tendency (VAT) plot. The evolution of these clusters are also shown. Furthermore, top 10 signature SNPs are identified in each cluster and the non-synonymous signature SNPs are visualised in the respective protein structures. Also, the sequence and structural homology-based prediction along with the protein structural stability of these non-synonymous signature SNPs are reported in order to judge the characteristics of the identified clusters. As a consequence, T85I, Q57H and R203M in NSP2, ORF3a and Nucleocapsid respectively are found to be responsible for Cluster 1 as they are damaging and unstable non-synonymous signature SNPs. Similarly, F506L and S507C in Exon are responsible for both Clusters 3 and 4 while Clusters 2 and 5 do not exhibit such behaviour due to the absence of any non-synonymous signature SNPs. In addition to all these, the code, SNP dataset, 10664 labelled SARS-CoV-2 strains and additional results as supplementary are provided through our website for further use.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Rapid Spread of Mutant Alleles in Worldwide SARS-CoV-2 Strains Revealed by Genome-Wide Single Nucleotide Polymorphism and Variation Analysis
    Zhu, Zhenglin
    Liu, Gexin
    Meng, Kaiwen
    Yang, Liuqing
    Liu, Di
    Meng, Geng
    GENOME BIOLOGY AND EVOLUTION, 2021, 13 (02):
  • [2] Genome-wide analysis of SARS-CoV-2 virus strains circulating worldwide implicates heterogeneity
    M. Rafiul Islam
    M. Nazmul Hoque
    M. Shaminur Rahman
    A. S. M. Rubayet Ul Alam
    Masuda Akther
    J. Akter Puspo
    Salma Akter
    Munawar Sultana
    Keith A. Crandall
    M. Anwar Hossain
    Scientific Reports, 10
  • [3] Genome-wide analysis of SARS-CoV-2 virus strains circulating worldwide implicates heterogeneity
    Islam, M. Rafiul
    Hoque, M. Nazmul
    Rahman, M. Shaminur
    Ul Alam, A. S. M. Rubayet
    Akther, Masuda
    Puspo, J. Akter
    Akter, Salma
    Sultana, Munawar
    Crandall, Keith A.
    Hossain, M. Anwar
    SCIENTIFIC REPORTS, 2020, 10 (01)
  • [4] Author Correction: Genome-wide analysis of SARS-CoV-2 virus strains circulating worldwide implicates heterogeneity
    M. Rafiul Islam
    M. Nazmul Hoque
    M. Shaminur Rahman
    A. S. M. Rubayet Ul Alam
    Masuda Akther
    J. Akter Puspo
    Salma Akter
    Munawar Sultana
    Keith A. Crandall
    M. Anwar Hossain
    Scientific Reports, 11
  • [5] Genome-wide analysis of Indian SARS-CoV-2 genomes for the identification of genetic mutation and SNP
    Saha, Indrajit
    Ghosh, Nimisha
    Maity, Debasree
    Sharma, Nikhil
    Sarkar, Jnanendra Prasad
    Mitra, Kaushik
    INFECTION GENETICS AND EVOLUTION, 2020, 85
  • [6] Genome-wide association study between SARS-CoV-2 single nucleotide polymorphisms and virus copies during infections
    Li, Ke
    Chaguza, Chrispin
    Stamp, Julian
    Chew, Yi Ting
    Chen, Nicholas F. G.
    Ferguson, David
    Pandya, Sameer
    Kerantzas, Nick
    Schulz, Wade
    Hahn, Anne M.
    Ogbunugafor, C. Brandon
    Pitzer, Virginia E.
    Crawford, Lorin
    Weinberger, Daniel M.
    Grubaugh, Nathan D.
    PLOS COMPUTATIONAL BIOLOGY, 2024, 20 (09)
  • [7] Genome-wide covariation in SARS-CoV-2
    Cresswell-Clay, Evan
    Periwal, Vipul
    MATHEMATICAL BIOSCIENCES, 2021, 341
  • [8] Genome-Wide Analysis to Identify Palindromes, Mirror and Inverted Repeats in SARS-CoV-2, MERS-CoV and SARS-CoV-1
    Ghosh, Nimisha
    Saha, Indrajit
    Plewczynski, Dariusz
    IEEE ACCESS, 2022, 10 : 23708 - 23715
  • [9] Genome-Wide Analysis of SARS-CoV-2 Variants in the United States
    Kan, M.
    Tehim, A.
    Lu, Q.
    Himes, B. E.
    AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE, 2022, 205
  • [10] Genome-wide analysis of SARS-CoV-2 virus strains circulating worldwide implicates heterogeneity (vol 10, 14004, 2020)
    Islam, M. Rafiul
    Hoque, M. Nazmul
    Rahman, M. Shaminur
    Ul Alam, A. S. M. Rubayet
    Akther, Masuda
    Puspo, J. Akter
    Akter, Salma
    Sultana, Munawar
    Crandall, Keith A.
    Hossain, M. Anwar
    SCIENTIFIC REPORTS, 2021, 11 (01)