Alignment-free sequence comparison for virus genomes based on location correlation coefficient

被引:7
|
作者
He, Lily [1 ]
Sun, Siyang [2 ]
Zhang, Qianyue [2 ]
Bao, Xiaona [1 ]
Li, Peter K. [3 ]
机构
[1] Beijing Univ Civil Engn & Architecture, Sch Sci, Beijing 102616, Peoples R China
[2] Renmin Univ China, High Sch, Beijing 100080, Peoples R China
[3] Tsinghua Univ, Sch Life Sci, Beijing 100084, Peoples R China
关键词
SARS-CoV-2; Alignment-free; Correlation measure; DNA sequence;
D O I
10.1016/j.meegid.2021.105106
中图分类号
R51 [传染病];
学科分类号
100401 ;
摘要
Coronaviruses (especially SARS-CoV-2) are characterized by rapid mutation and wide spread. As these characteristics easily lead to global pandemics, studying the evolutionary relationship between viruses is essential for clinical diagnosis. DNA sequencing has played an important role in evolutionary analysis. Recent alignment-free methods can overcome the problems of traditional alignment-based methods, which consume both time and space. This paper proposes a novel alignment-free method called the correlation coefficient feature vector (CCFV), which defines a correlation measure of the L-step delay of a nucleotide location from its location in the original DNA sequence. The numerical feature is a 16 xL-dimensional numerical vector describing the distribution characteristics of the nucleotide positions in a DNA sequence. The proposed L-step delay correlation measure is interestingly related to some types of L + 1 spaced mers. Unlike traditional gene comparison, our method avoids the computational complexity of multiple sequence alignment, and hence improves the speed of sequence comparison. Our method is applied to evolutionary analysis of the common human viruses including SARS-CoV-2, Dengue virus, Hepatitis B virus, and human rhinovirus and achieves the same or even better results than alignment-based methods. Especially for SARS-CoV-2, our method also confirms that bats are potential intermediate hosts of SARS-CoV-2.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Sequence Comparison Alignment-Free Approach Based on Suffix Tree and L-Words Frequency
    Soares, Ines
    Goios, Ana
    Amorim, Antonio
    SCIENTIFIC WORLD JOURNAL, 2012,
  • [42] Spaced words and kmacs: fast alignment-free sequence comparison based on inexact word matches
    Horwege, Sebastian
    Lindner, Sebastian
    Boden, Marcus
    Hatje, Klas
    Kollmar, Martin
    Leimeister, Chris-Andre
    Morgenstern, Burkhard
    NUCLEIC ACIDS RESEARCH, 2014, 42 (W1) : W7 - W11
  • [43] Protein map: An alignment-free sequence comparison method based on various properties of amino acids
    Yu, Chenglong
    Cheng, Shiu-Yuen
    He, Rong L.
    Yau, Stephen S. -T.
    GENE, 2011, 486 (1-2) : 110 - 118
  • [44] Alignment-free viral sequence classification at scale
    Daniel J. van Zyl
    Marcel Dunaiski
    Houriiyah Tegally
    Cheryl Baxter
    Tulio de Oliveira
    Joicymara S. Xavier
    BMC Genomics, 26 (1)
  • [45] CAFE: aCcelerated Alignment-FrEe sequence analysis
    Lu, Yang Young
    Tang, Kujin
    Ren, Jie
    Fuhrman, Jed A.
    Waterman, Michael S.
    Sun, Fengzhu
    NUCLEIC ACIDS RESEARCH, 2017, 45 (W1) : W554 - W559
  • [46] An effective extension of the applicability of alignment-free biological sequence comparison algorithms with Hadoop
    Cattaneo, Giuseppe
    Petrillo, Umberto Ferraro
    Giancarlo, Raffaele
    Roscigno, Gianluca
    JOURNAL OF SUPERCOMPUTING, 2017, 73 (04): : 1467 - 1483
  • [47] Interpreting alignment-free sequence comparison: what makes a score a good score?
    Swain, Martin T.
    Vickers, Martin
    NAR GENOMICS AND BIOINFORMATICS, 2022, 4 (03)
  • [48] Extraction of high quality k-words for alignment-free sequence comparison
    Gunasinghe, Upuli
    Alahakoon, Damminda
    Bedingfield, Susan
    JOURNAL OF THEORETICAL BIOLOGY, 2014, 358 : 31 - 51
  • [49] Fast alignment-free sequence comparison using spaced-word frequencies
    Leimeister, Chris-Andre
    Boden, Marcus
    Horwege, Sebastian
    Lindner, Sebastian
    Morgenstern, Burkhard
    BIOINFORMATICS, 2014, 30 (14) : 1991 - 1999
  • [50] Alignment-Free Sequence Comparison Using N-Dimensional Similarity Space
    Jayalakshmi, Ramamurthy
    Natarajan, Ramanathan
    Vivekanandan, Munusamy
    Natarajan, Ganapathy S.
    CURRENT COMPUTER-AIDED DRUG DESIGN, 2010, 6 (04) : 290 - 296