Alignment-free sequence comparison for virus genomes based on location correlation coefficient

被引:7
|
作者
He, Lily [1 ]
Sun, Siyang [2 ]
Zhang, Qianyue [2 ]
Bao, Xiaona [1 ]
Li, Peter K. [3 ]
机构
[1] Beijing Univ Civil Engn & Architecture, Sch Sci, Beijing 102616, Peoples R China
[2] Renmin Univ China, High Sch, Beijing 100080, Peoples R China
[3] Tsinghua Univ, Sch Life Sci, Beijing 100084, Peoples R China
关键词
SARS-CoV-2; Alignment-free; Correlation measure; DNA sequence;
D O I
10.1016/j.meegid.2021.105106
中图分类号
R51 [传染病];
学科分类号
100401 ;
摘要
Coronaviruses (especially SARS-CoV-2) are characterized by rapid mutation and wide spread. As these characteristics easily lead to global pandemics, studying the evolutionary relationship between viruses is essential for clinical diagnosis. DNA sequencing has played an important role in evolutionary analysis. Recent alignment-free methods can overcome the problems of traditional alignment-based methods, which consume both time and space. This paper proposes a novel alignment-free method called the correlation coefficient feature vector (CCFV), which defines a correlation measure of the L-step delay of a nucleotide location from its location in the original DNA sequence. The numerical feature is a 16 xL-dimensional numerical vector describing the distribution characteristics of the nucleotide positions in a DNA sequence. The proposed L-step delay correlation measure is interestingly related to some types of L + 1 spaced mers. Unlike traditional gene comparison, our method avoids the computational complexity of multiple sequence alignment, and hence improves the speed of sequence comparison. Our method is applied to evolutionary analysis of the common human viruses including SARS-CoV-2, Dengue virus, Hepatitis B virus, and human rhinovirus and achieves the same or even better results than alignment-based methods. Especially for SARS-CoV-2, our method also confirms that bats are potential intermediate hosts of SARS-CoV-2.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Alignment-Free Sequence Comparison Based on Next-Generation Sequencing Reads
    Song, Kai
    Ren, Jie
    Zhai, Zhiyuan
    Liu, Xuemei
    Deng, Minghua
    Sun, Fengzhu
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2013, 20 (02) : 64 - 79
  • [22] Alignment-Free Sequence Comparison (II): Theoretical Power of Comparison Statistics
    Wan, Lin
    Reinert, Gesine
    Sun, Fengzhu
    Waterman, Michael S.
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2010, 17 (11) : 1467 - +
  • [23] An Algorithm for Alignment-free Sequence Comparison using Logical Match
    Shanker, Sanil
    Austin, Jim
    Sherly, Elizabeth
    2010 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING (ICCAE 2010), VOL 3, 2010, : 536 - 538
  • [24] Statistical considerations underpinning an alignment-free sequence comparison method
    Junmei Jing
    Conrad J. Burden
    Sylvain Forêt
    Susan R. Wilson
    Journal of the Korean Statistical Society, 2010, 39 : 325 - 335
  • [25] Alignment-Free Sequence Comparison over Hadoop for Computational Biology
    Cattaneo, Giuseppe
    Petrillo, Umberto Ferraro
    Giancarlo, Raffaele
    Roscigno, Gianluca
    2015 44TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS, 2015, : 184 - 192
  • [26] Statistical considerations underpinning an alignment-free sequence comparison method
    Jing, Junmei
    Burden, Conrad J.
    Foret, Sylvain
    Wilson, Susan R.
    JOURNAL OF THE KOREAN STATISTICAL SOCIETY, 2010, 39 (03) : 325 - 335
  • [27] Variable length local decoding and alignment-free sequence comparison
    Didier, Gilles
    Corel, Eduardo
    Laprevotte, Ivan
    Grossmann, Alex
    Landes-Devauchelle, Claudine
    THEORETICAL COMPUTER SCIENCE, 2012, 462 : 1 - 11
  • [28] Positional difference and Frequency (PdF) based alignment-free technique for genome sequence comparison
    Dey, Sudeshna
    Ghosh, Papri
    Das, Subhram
    JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 2024, 42 (23): : 12660 - 12688
  • [29] Application of Sequence Alignment-Free Comparison-Based SeqDistK to Microbial Flora Clustering
    Liu X.
    Huang G.
    Huang T.
    Huanan Ligong Daxue Xuebao/Journal of South China University of Technology (Natural Science), 2019, 47 (11): : 71 - 77
  • [30] An alignment-free measure based on physicochemical properties of amino acids for protein sequence comparison
    Zhao, Yunxiu
    Xue, Xiaolong
    Xie, Xiaoli
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2019, 80 : 10 - 15