Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer

被引:33
|
作者
Zhang, Qian [1 ,2 ]
Jun, Se-Ran [2 ,3 ]
Leuze, Michael [4 ,5 ]
Ussery, David [2 ,3 ]
Nookaew, Intawat [2 ,3 ]
机构
[1] Univ Tennessee, UT ORNL Grad Sch Genome Sci & Technol, Knoxville, TN 37996 USA
[2] Oak Ridge Natl Lab, Biosci Div, Comparat Genom Grp, Oak Ridge, TN 37831 USA
[3] Univ Arkansas Med Sci, Coll Med, Dept Biomed Informat, Little Rock, AR 72205 USA
[4] Univ Tennessee, Joint Inst Computat Sci, Knoxville, TN 37831 USA
[5] Oak Ridge Natl Lab, Comp Sci & Math Div, Computat Biomol Modeling & Bioinformat Grp, Oak Ridge, TN 37831 USA
来源
SCIENTIFIC REPORTS | 2017年 / 7卷
关键词
FEATURE FREQUENCY PROFILES; WHOLE-PROTEOME PHYLOGENY; GENOME PHYLOGENY; SEQUENCE; INFORMATION; VIRUS; INCONGRUENCE; EVOLUTION; TAXONOMY; KMACS;
D O I
10.1038/srep40712
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained > 2 million viral genome sequences and a reference set of similar to 4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral " tree of life". However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conserved proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. The resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses.
引用
收藏
页数:13
相关论文
共 16 条
  • [1] Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer
    Qian Zhang
    Se-Ran Jun
    Michael Leuze
    David Ussery
    Intawat Nookaew
    Scientific Reports, 7
  • [2] KITSUNE: A Tool for Identifying Empirically Optimal K-mer Length for Alignment-Free Phylogenomic Analysis
    Pornputtapong, Natapol
    Acheampong, Daniel A.
    Patumcharoenpol, Preecha
    Jenjaroenpun, Piroon
    Wongsurawat, Thidathip
    Jun, Se-Ran
    Yongkiettrakul, Suganya
    Chokesajjawatee, Nipa
    Nookaew, Intawat
    FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2020, 8
  • [3] Alignment-free Whole Genome Comparison Using k-mer Forests
    Gamage, G.
    Gimhana, N.
    Wickramarachchi, A.
    Mallawaarachchi, V
    Perera, I
    2019 19TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER - 2019), 2019,
  • [4] Nucleotide amino acid k-mer vector: an alignment-free method for comparing genomic sequences
    Bao, Xiaona
    He, Lily
    Cui, Jingan
    Yau, Stephen S-T
    COMMUNICATIONS IN INFORMATION AND SYSTEMS, 2022, 22 (03) : 317 - 337
  • [5] Phylogenomics of tomato chloroplasts using assembly and alignment-free method
    Amado Cattaneo, Raul Martin
    Diannbra, Luis
    Norman McCarthy, Andres
    MITOCHONDRIAL DNA PART A, 2018, 29 (07) : 1128 - 1138
  • [6] The statistical power of k-mer based aggregative statistics for alignment-free detection of horizontal gene transfer
    Huang, Guan-Da
    Liu, Xue-Mei
    Huang, Tian-Lai
    Xia, Li-C.
    SYNTHETIC AND SYSTEMS BIOTECHNOLOGY, 2019, 4 (03) : 150 - 156
  • [7] KINN: An alignment-free accurate phylogeny reconstruction method based on inner distance distributions of k-mer pairs in biological sequences
    Tang, Runbin
    Yu, Zuguo
    Li, Jinyan
    MOLECULAR PHYLOGENETICS AND EVOLUTION, 2023, 179
  • [8] KmerAperture: Retaining k-mer synteny for alignment-free extraction of core and accessory differences between bacterial genomes
    Moore, Matthew P.
    Laager, Mirjam
    Ribeca, Paolo
    Didelot, Xavier
    PLOS GENETICS, 2024, 20 (04):
  • [9] Success of alignment-free oligonucleotide (k-mer) analysis confirms relative importance of genomes not genes in speciation and phylogeny
    Forsdyke, Donald R.
    BIOLOGICAL JOURNAL OF THE LINNEAN SOCIETY, 2019, 128 (02) : 239 - 250