Optimizing k-mer size using a variant grid search to enhance de novo genome assembly

被引:0
|
作者
Cha, Soyeon
Bird, David McK [1 ]
机构
[1] NC State Univ, Bioinformat Res Ctr, Raleigh, NC 27695 USA
关键词
ABySS; CEGMA; contigs; KmerGenie; N50; next-generation sequencing; SOAPdonovo; Velvet;
D O I
暂无
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Largely driven by huge reductions in per-base costs, sequencing nucleic adds has become a near-ubiquitous technique in laboratories performing biological and biomedical research. Most of the effort goes to re-sequencing, but assembly of de novo-generated, raw sequence reads into contigs that span as much of the genome as possible is central to many projects. Although truly complete coverage is not realistically attainable, maximizing the amount of sequence that can be correctly assembled into contigs contributes to coverage. Here we compare three commonly used assembly algorithms (ABySS, Velvet and SOAPdenovo2), and show that empirical optimization of k-mer values has a disproportionate influence on de novo assembly of a eukaryotic genome, the nematode parasite Meloidogynechitwoodi. Each assembler was challenged with similar to 40 million Iluumina II paired-end reads, and assemblies performed under a range of k-mer sizes. In each instance, the optimal k-mer was 127, although based on N50 values,ABySS was more efficient than the others. That the assembly was not spurious was established using the "Core Eukaryotic Gene Mapping Approach", which indicated that 98.79% of the M. chitwoodi genome was accounted for by the assembly. Subsequent gene finding and annotation are consistent with this and suggest that k-mer optimization contributes to the robustness of assembly.
引用
收藏
页码:36 / 40
页数:5
相关论文
共 50 条
  • [31] Factorial estimating assembly base errors using k-mer abundance difference (KAD) between short reads and genome assembled sequences
    He, Cheng
    Lin, Guifang
    Wei, Hairong
    Tang, Haibao
    White, Frank F.
    Valent, Barbara
    Liu, Sanzhen
    NAR GENOMICS AND BIOINFORMATICS, 2020, 2 (03)
  • [32] Genome-scale de novo assembly using ALGA
    Swat, Sylwester
    Laskowski, Artur
    Badura, Jan
    Frohmberg, Wojciech
    Wojciechowski, Pawel
    Swiercz, Aleksandra
    Kasprzak, Marta
    Blazewicz, Jacek
    BIOINFORMATICS, 2021, 37 (12) : 1644 - 1651
  • [33] The effects of sampling on the efficiency and accuracy of k-mer indexes: Theoretical and empirical comparisons using the human genome
    Almutairy, Meznah
    Torng, Eric
    PLOS ONE, 2017, 12 (07):
  • [34] GSearch: ultra-fast and scalable genome search by combining K-mer hashing with hierarchical navigable small world graphs
    Zhao, Jianshu
    Both, Jean Pierre
    Rodriguez-R, Luis M.
    Konstantinidis, Konstantinos T.
    NUCLEIC ACIDS RESEARCH, 2024, 52 (16)
  • [35] Assisted assembly: how to improve a de novo genome assembly by using related species
    Gnerre, Sante
    Lander, Eric S.
    Lindblad-Toh, Kerstin
    Jaffe, David B.
    GENOME BIOLOGY, 2009, 10 (08):
  • [36] Assisted assembly: how to improve a de novo genome assembly by using related species
    Sante Gnerre
    Eric S Lander
    Kerstin Lindblad-Toh
    David B Jaffe
    Genome Biology, 10
  • [37] Determination of the chromosome number and genome size of Garcinia mangostana L. via cytogenetics, flow cytometry and k-mer analyses
    Midin, Mohd Razik
    Nordin, Mohd Shukor
    Madon, Maria
    Saleh, Mohd Nazre
    Goh, Hoe-Han
    Noor, Normah Mohd
    CARYOLOGIA, 2018, 71 (01) : 35 - 44
  • [38] Effective de novo assembly of fish genome using haploid larvae
    Iwasaki, Yuki
    Nishiki, Issei
    Nakamura, Yoji
    Yasuike, Motoshige
    Kai, Wataru
    Nomura, Kazuharu
    Yoshida, Kazunori
    Nomura, Yousuke
    Fujiwara, Atushi
    Kobayashi, Takanori
    Ototake, Mitsuru
    GENE, 2016, 576 (02) : 644 - 649
  • [39] De novo diploid genome assembly using long noisy reads
    Fan Nie
    Peng Ni
    Neng Huang
    Jun Zhang
    Zhenyu Wang
    Chuanle Xiao
    Feng Luo
    Jianxin Wang
    Nature Communications, 15
  • [40] Finding simple sequence repeats (SSRs) within human genome using MapReduce based K-mer algorithm
    Mondal, Sudip
    Khatua, Sunirmal
    2018 FIFTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (IEEE PDGC), 2018, : 340 - 345