KmerAperture: Retaining k-mer synteny for alignment-free extraction of core and accessory differences between bacterial genomes

被引:1
|
作者
Moore, Matthew P. [1 ,2 ]
Laager, Mirjam [3 ]
Ribeca, Paolo [4 ]
Didelot, Xavier [1 ,2 ]
机构
[1] Univ Warwick, Sch Life Sci, Coventry, England
[2] Univ Warwick, Dept Stat, Coventry, England
[3] Univ Hosp Basel, Div Transplant Immunol & Nephrol, Basel, Switzerland
[4] UK Hlth Secur Agcy, London, England
来源
PLOS GENETICS | 2024年 / 20卷 / 04期
关键词
SEQUENCE;
D O I
10.1371/journal.pgen.1011184
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
By decomposing genome sequences into k-mers, it is possible to estimate genome differences without alignment. Techniques such as k-mer minimisers, for example MinHash, have been developed and are often accurate approximations of distances based on full k-mer sets. These and other alignment-free methods avoid the large temporal and computational expense of alignment. However, these k-mer set comparisons are not entirely accurate within-species and can be completely inaccurate within-lineage. This is due, in part, to their inability to distinguish core polymorphism from accessory differences. Here we present a new approach, KmerAperture, which uses information on the k-mer relative genomic positions to determine the type of polymorphism causing differences in k-mer presence and absence between pairs of genomes. Single SNPs are expected to result in contiguous of k unique k-mers per genome. On the other hand, contiguous series > k may be caused by accessory differences of length S-k+1; when the start and end of the sequence are contiguous with homologous sequence. Alternatively, they may be caused by multiple SNPs within k bp from each other and KmerAperture can determine whether that is the case. To demonstrate use cases KmerAperture was benchmarked using datasets including a very low diversity simulated population with accessory content independent from the number of SNPs, a simulated population were SNPs are spatially dense, a moderately diverse real cluster of genomes (Escherichia coli ST1193) with a large accessory genome and a low diversity real genome cluster (Salmonella Typhimurium ST34). We show that KmerAperture can accurately distinguish both core and accessory sequence diversity without alignment, outperforming other k-mer based tools.
引用
收藏
页数:17
相关论文
共 9 条
  • [1] Alignment-free Whole Genome Comparison Using k-mer Forests
    Gamage, G.
    Gimhana, N.
    Wickramarachchi, A.
    Mallawaarachchi, V
    Perera, I
    2019 19TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER - 2019), 2019,
  • [2] Success of alignment-free oligonucleotide (k-mer) analysis confirms relative importance of genomes not genes in speciation and phylogeny
    Forsdyke, Donald R.
    BIOLOGICAL JOURNAL OF THE LINNEAN SOCIETY, 2019, 128 (02) : 239 - 250
  • [3] KITSUNE: A Tool for Identifying Empirically Optimal K-mer Length for Alignment-Free Phylogenomic Analysis
    Pornputtapong, Natapol
    Acheampong, Daniel A.
    Patumcharoenpol, Preecha
    Jenjaroenpun, Piroon
    Wongsurawat, Thidathip
    Jun, Se-Ran
    Yongkiettrakul, Suganya
    Chokesajjawatee, Nipa
    Nookaew, Intawat
    FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2020, 8
  • [4] Nucleotide amino acid k-mer vector: an alignment-free method for comparing genomic sequences
    Bao, Xiaona
    He, Lily
    Cui, Jingan
    Yau, Stephen S-T
    COMMUNICATIONS IN INFORMATION AND SYSTEMS, 2022, 22 (03) : 317 - 337
  • [5] The statistical power of k-mer based aggregative statistics for alignment-free detection of horizontal gene transfer
    Huang, Guan-Da
    Liu, Xue-Mei
    Huang, Tian-Lai
    Xia, Li-C.
    SYNTHETIC AND SYSTEMS BIOTECHNOLOGY, 2019, 4 (03) : 150 - 156
  • [6] Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer
    Qian Zhang
    Se-Ran Jun
    Michael Leuze
    David Ussery
    Intawat Nookaew
    Scientific Reports, 7
  • [7] Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer
    Zhang, Qian
    Jun, Se-Ran
    Leuze, Michael
    Ussery, David
    Nookaew, Intawat
    SCIENTIFIC REPORTS, 2017, 7
  • [8] KINN: An alignment-free accurate phylogeny reconstruction method based on inner distance distributions of k-mer pairs in biological sequences
    Tang, Runbin
    Yu, Zuguo
    Li, Jinyan
    MOLECULAR PHYLOGENETICS AND EVOLUTION, 2023, 179