Alignment-free sequence comparison for virus genomes based on location correlation coefficient

被引:7
|
作者
He, Lily [1 ]
Sun, Siyang [2 ]
Zhang, Qianyue [2 ]
Bao, Xiaona [1 ]
Li, Peter K. [3 ]
机构
[1] Beijing Univ Civil Engn & Architecture, Sch Sci, Beijing 102616, Peoples R China
[2] Renmin Univ China, High Sch, Beijing 100080, Peoples R China
[3] Tsinghua Univ, Sch Life Sci, Beijing 100084, Peoples R China
关键词
SARS-CoV-2; Alignment-free; Correlation measure; DNA sequence;
D O I
10.1016/j.meegid.2021.105106
中图分类号
R51 [传染病];
学科分类号
100401 ;
摘要
Coronaviruses (especially SARS-CoV-2) are characterized by rapid mutation and wide spread. As these characteristics easily lead to global pandemics, studying the evolutionary relationship between viruses is essential for clinical diagnosis. DNA sequencing has played an important role in evolutionary analysis. Recent alignment-free methods can overcome the problems of traditional alignment-based methods, which consume both time and space. This paper proposes a novel alignment-free method called the correlation coefficient feature vector (CCFV), which defines a correlation measure of the L-step delay of a nucleotide location from its location in the original DNA sequence. The numerical feature is a 16 xL-dimensional numerical vector describing the distribution characteristics of the nucleotide positions in a DNA sequence. The proposed L-step delay correlation measure is interestingly related to some types of L + 1 spaced mers. Unlike traditional gene comparison, our method avoids the computational complexity of multiple sequence alignment, and hence improves the speed of sequence comparison. Our method is applied to evolutionary analysis of the common human viruses including SARS-CoV-2, Dengue virus, Hepatitis B virus, and human rhinovirus and achieves the same or even better results than alignment-based methods. Especially for SARS-CoV-2, our method also confirms that bats are potential intermediate hosts of SARS-CoV-2.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] SENSE: Siamese neural network for sequence embedding and alignment-free comparison
    Zheng, Wei
    Yang, Le
    Genco, Robert J.
    Wactawski-Wende, Jean
    Buck, Michael
    Sun, Yijun
    BIOINFORMATICS, 2019, 35 (11) : 1820 - 1828
  • [32] Alignment-free Sequence Comparison for Biologically Realistic Sequences of Moderate Length
    Burden, Conrad J.
    Jing, Junmei
    Wilson, Susan R.
    STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2012, 11 (01)
  • [33] MissMax: alignment-free sequence comparison with mismatches through filtering and heuristics
    Pizzi, Cinzia
    ALGORITHMS FOR MOLECULAR BIOLOGY, 2016, 11
  • [34] An Alignment-Free Distance Measure for Closely Related Genomes
    Haubold, Bernhard
    Domazet-Loso, Mirjana
    Wiehe, Thomas
    COMPARATIVE GENOMICS, PROCEEDINGS, 2008, 5267 : 87 - +
  • [35] Alignment-free genomic sequence comparison using FCGR and signal processing
    Lichtblau, Daniel
    BMC BIOINFORMATICS, 2019, 20 (01)
  • [36] Simplification of protein sequence and alignment-free sequence analysis
    Li Jing
    Li Feng-Bo
    Wang Wei
    PROGRESS IN BIOCHEMISTRY AND BIOPHYSICS, 2006, 33 (12) : 1215 - 1222
  • [37] Alignment-free genomic sequence comparison using FCGR and signal processing
    Daniel Lichtblau
    BMC Bioinformatics, 20
  • [38] Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification
    Borozan, Ivan
    Watt, Stuart
    Ferretti, Vincent
    BIOINFORMATICS, 2015, 31 (09) : 1396 - 1404
  • [39] MissMax: alignment-free sequence comparison with mismatches through filtering and heuristics
    Cinzia Pizzi
    Algorithms for Molecular Biology, 11
  • [40] Alignment-free sequence comparison with vector quantization and hidden Markov models
    Pham, T
    PROCEEDINGS OF THE 2003 IEEE BIOINFORMATICS CONFERENCE, 2003, : 534 - 535