Konnector: Connecting Paired-end Reads Using a Bloom Filter de Bruijn Graph

被引:0
|
作者
Vandervalk, Benjamin P. [1 ]
Jackman, Shaun D. [1 ]
Raymond, Anthony [1 ]
Mohamadi, Hamid [1 ]
Yang, Chen [1 ]
Attali, Dean A. [1 ]
Chu, Justin [1 ]
Warren, Rene L. [1 ]
Birol, Inanc [1 ]
机构
[1] BC Canc Agcy, Genome Sci Ctr, Vancouver, BC, Canada
关键词
Bloom filter; de Bruijn graph; paired-end sequencing; de novo genome assembly; DNA-SEQUENCES; ALIGNMENT; GENOMES; TOOL;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Paired-end sequencing yields a read from each end of a DNA fragment, typically leaving a gap of unsequenced nucleotides in the middle. Closing this gap using information from other reads in the same sequencing experiment offers the potential to generate longer "pseudo-reads" using short read sequencing platforms. Such long reads may benefit downstream applications such as de novo sequence assembly, gap filling, and variant detection. With these possible applications in mind, we have developed Konnector, a software tool to fill in the nucleotides of the sequence gap between read pairs by navigating a de Bruijn graph. Konnector represents the de Bruijn graph using a Bloom filter, a probabilistic and memory-efficient data structure. Our implementation is able to store the de Bruijn graph using a mean 1.5 bytes of memory per k-mer, which represents a marked improvement over the typical hash table data structure. The memory usage per k-mer is independent of the k-mer length, enabling application of the tool to large genomes. We report the performance of the tool on simulated and experimental datasets, and discuss its utility for downstream analysis. Availability-Konnector is open-source software, free for academic use, released under the British Columbia Cancer Agency's academic license. The tool is included with ABySS version 1.5.2 and later, and is available for download from http://www.bcgsc.ca/platform/bioinfo/software/abyss.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] Concatenation of paired-end reads improves taxonomic classification of amplicons for profiling microbial communities
    Daniel P. Dacey
    Frédéric J. J. Chain
    BMC Bioinformatics, 22
  • [42] DB2: a probabilistic approach for accurate detection of tandem duplication breakpoints using paired-end reads
    Gökhan Yavaş
    Mehmet Koyutürk
    Meetha P Gould
    Sarah McMahon
    Thomas LaFramboise
    BMC Genomics, 15
  • [43] Concatenation of paired-end reads improves taxonomic classification of amplicons for profiling microbial communities
    Dacey, Daniel P.
    Chain, Frederic J. J.
    BMC BIOINFORMATICS, 2021, 22 (01)
  • [44] DB2: a probabilistic approach for accurate detection of tandem duplication breakpoints using paired-end reads
    Yavas, Goekhan
    Koyutuerk, Mehmet
    Gould, Meetha P.
    McMahon, Sarah
    LaFramboise, Thomas
    BMC GENOMICS, 2014, 15
  • [45] A Novel Pipeline for V(D)J Junction Identification using RNA-Seq Paired-end Reads
    Paciello, Giulia
    Ficarra, Elisa
    Zamo, Alberto
    Pighi, Chiara
    Foti, Carmelo
    Abate, Francesco
    Macii, Enrico
    Acquaviva, Andrea
    BIOINFORMATICS 2013: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON BIOINFORMATICS MODELS, METHODS AND ALGORITHMS, 2013, : 185 - 189
  • [46] Correction to: P_RNA_scaffolder: a fast and accurate genome scaffolder using paired-end RNAsequencing reads
    Bai-Han Zhu
    Jun Xiao
    Wei Xue
    Gui-Cai Xu
    Ming-Yuan Sun
    Jiong-Tang Li
    BMC Genomics, 20
  • [47] Hybrid correction of highly noisy long reads using a variable-order de Bruijn graph
    Morisse, Pierre
    Lecroq, Thierry
    Lefebvre, Arnaud
    BIOINFORMATICS, 2018, 34 (24) : 4213 - 4222
  • [48] Benefits of merging paired-end reads before pre-processing environmental metagenomics data
    Immaculate, Midhuna
    Maran, Joseph
    Davis, Dicky John G.
    MARINE GENOMICS, 2022, 61
  • [49] TRIP: a method for novel transcript reconstruction from paired-end RNA-seq reads
    Serghei Mangul
    Adrian Caciula
    Dumitru Brinza
    Ion I Mandoiu
    Alex Zelikovsky
    BMC Bioinformatics, 13 (Suppl 18)
  • [50] PuFFIN - a parameter-free method to build nucleosome maps from paired-end reads
    Polishko, Anton
    Bunnik, Evelien M.
    Le Roch, Karine G.
    Lonardi, Stefano
    BMC BIOINFORMATICS, 2014, 15