Optimizing k-mer size using a variant grid search to enhance de novo genome assembly

被引:0
|
作者
Cha, Soyeon
Bird, David McK [1 ]
机构
[1] NC State Univ, Bioinformat Res Ctr, Raleigh, NC 27695 USA
关键词
ABySS; CEGMA; contigs; KmerGenie; N50; next-generation sequencing; SOAPdonovo; Velvet;
D O I
暂无
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Largely driven by huge reductions in per-base costs, sequencing nucleic adds has become a near-ubiquitous technique in laboratories performing biological and biomedical research. Most of the effort goes to re-sequencing, but assembly of de novo-generated, raw sequence reads into contigs that span as much of the genome as possible is central to many projects. Although truly complete coverage is not realistically attainable, maximizing the amount of sequence that can be correctly assembled into contigs contributes to coverage. Here we compare three commonly used assembly algorithms (ABySS, Velvet and SOAPdenovo2), and show that empirical optimization of k-mer values has a disproportionate influence on de novo assembly of a eukaryotic genome, the nematode parasite Meloidogynechitwoodi. Each assembler was challenged with similar to 40 million Iluumina II paired-end reads, and assemblies performed under a range of k-mer sizes. In each instance, the optimal k-mer was 127, although based on N50 values,ABySS was more efficient than the others. That the assembly was not spurious was established using the "Core Eukaryotic Gene Mapping Approach", which indicated that 98.79% of the M. chitwoodi genome was accounted for by the assembly. Subsequent gene finding and annotation are consistent with this and suggest that k-mer optimization contributes to the robustness of assembly.
引用
收藏
页码:36 / 40
页数:5
相关论文
共 50 条
  • [1] Informed and automated k-mer size selection for genome assembly
    Chikhi, Rayan
    Medvedev, Paul
    BIOINFORMATICS, 2014, 30 (01) : 31 - 37
  • [2] Compact representation of k-mer de Bruijn graphs for genome read assembly
    Rodland, Einar Andreas
    BMC BIOINFORMATICS, 2013, 14
  • [3] Compact representation of k-mer de Bruijn graphs for genome read assembly
    Einar Andreas Rødland
    BMC Bioinformatics, 14
  • [4] GGAKE: GPU based Genome Assembly using K-mer Extension
    Garg, Anshuj
    Jain, Ashutosh
    Paul, Kolin
    2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 1105 - 1112
  • [5] findGSEP: estimating genome size of polyploid species using k-mer frequencies
    Fu, Laiyi
    Xie, Yanxin
    Ling, Shunkang
    Wang, Ying
    Wang, Binzhong
    Du, Hejun
    Peng, Qinke
    Sun, Hequan
    BIOINFORMATICS, 2024, 40 (11)
  • [6] A pipeline for the de novo assembly of the Themira biloba (Sepsidae: Diptera) transcriptome using a multiple k-mer length approach
    Melicher, Dacotah
    Torson, Alex S.
    Dworkin, Ian
    Bowsher, Julia H.
    BMC GENOMICS, 2014, 15
  • [7] A pipeline for the de novo assembly of the Themira biloba(Sepsidae: Diptera) transcriptome using a multiple k-mer length approach
    Dacotah Melicher
    Alex S Torson
    Ian Dworkin
    Julia H Bowsher
    BMC Genomics, 15
  • [8] Complete Taiwanese Macaque (Macaca cyclopis) Mitochondrial Genome: Reference-Assisted de novo Assembly with Multiple k-mer Strategy
    Huang, Yu-Feng
    Midha, Mohit
    Chen, Tzu-Han
    Wang, Yu-Tai
    Smith, David Glenn
    Pei, Kurtis Jai-Chyi
    Chiu, Kuo Ping
    PLOS ONE, 2015, 10 (06):
  • [9] Comparison of De Novo Transcriptome Assemblers and k-mer Strategies Using the Killifish, Fundulus heteroclitus
    Rana, Satshil B.
    Zadlock, Frank J.
    Zhang, Ziping
    Murphy, Wyatt R.
    Bentivegna, Carolyn S.
    PLOS ONE, 2016, 11 (04):
  • [10] Optimizing Spaced k-mer Neighbors for Efficient Filtration in Protein Similarity Search
    Li, Weiming
    Ma, Bin
    Zhang, Kaizhong
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2014, 11 (02) : 398 - 406