Konnector: Connecting Paired-end Reads Using a Bloom Filter de Bruijn Graph

被引:0
|
作者
Vandervalk, Benjamin P. [1 ]
Jackman, Shaun D. [1 ]
Raymond, Anthony [1 ]
Mohamadi, Hamid [1 ]
Yang, Chen [1 ]
Attali, Dean A. [1 ]
Chu, Justin [1 ]
Warren, Rene L. [1 ]
Birol, Inanc [1 ]
机构
[1] BC Canc Agcy, Genome Sci Ctr, Vancouver, BC, Canada
关键词
Bloom filter; de Bruijn graph; paired-end sequencing; de novo genome assembly; DNA-SEQUENCES; ALIGNMENT; GENOMES; TOOL;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Paired-end sequencing yields a read from each end of a DNA fragment, typically leaving a gap of unsequenced nucleotides in the middle. Closing this gap using information from other reads in the same sequencing experiment offers the potential to generate longer "pseudo-reads" using short read sequencing platforms. Such long reads may benefit downstream applications such as de novo sequence assembly, gap filling, and variant detection. With these possible applications in mind, we have developed Konnector, a software tool to fill in the nucleotides of the sequence gap between read pairs by navigating a de Bruijn graph. Konnector represents the de Bruijn graph using a Bloom filter, a probabilistic and memory-efficient data structure. Our implementation is able to store the de Bruijn graph using a mean 1.5 bytes of memory per k-mer, which represents a marked improvement over the typical hash table data structure. The memory usage per k-mer is independent of the k-mer length, enabling application of the tool to large genomes. We report the performance of the tool on simulated and experimental datasets, and discuss its utility for downstream analysis. Availability-Konnector is open-source software, free for academic use, released under the British Columbia Cancer Agency's academic license. The tool is included with ABySS version 1.5.2 and later, and is available for download from http://www.bcgsc.ca/platform/bioinfo/software/abyss.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Konnector v2.0: pseudo-long reads from paired-end sequencing data
    Vandervalk, Benjamin P.
    Yang, Chen
    Xue, Zhuyi
    Raghavan, Karthika
    Chu, Justin
    Mohamadi, Hamid
    Jackman, Shaun D.
    Chiu, Readman
    Warren, Rene L.
    Birol, Inanc
    BMC MEDICAL GENOMICS, 2015, 8
  • [2] Konnector v2.0: pseudo-long reads from paired-end sequencing data
    Benjamin P Vandervalk
    Chen Yang
    Zhuyi Xue
    Karthika Raghavan
    Justin Chu
    Hamid Mohamadi
    Shaun D Jackman
    Readman Chiu
    René L Warren
    Inanç Birol
    BMC Medical Genomics, 8
  • [3] PE-Assembler: de novo assembler using short paired-end reads
    Ariyaratne, Pramila Nuwantha
    Sung, Wing-Kin
    BIOINFORMATICS, 2011, 27 (02) : 167 - 174
  • [4] Meraculous: De Novo Genome Assembly with Short Paired-End Reads
    Chapman, Jarrod A.
    Ho, Isaac
    Sunkara, Sirisha
    Luo, Shujun
    Schroth, Gary P.
    Rokhsar, Daniel S.
    PLOS ONE, 2011, 6 (08):
  • [5] Accurate indel prediction using paired-end short reads
    Dominik Grimm
    Jörg Hagmann
    Daniel Koenig
    Detlef Weigel
    Karsten Borgwardt
    BMC Genomics, 14
  • [6] HI: haplotype improver using paired-end short reads
    Long, Quan
    MacArthur, Daniel
    Ning, Zemin
    Tyler-Smith, Chris
    BIOINFORMATICS, 2009, 25 (18) : 2436 - 2437
  • [7] Accurate indel prediction using paired-end short reads
    Grimm, Dominik
    Hagmann, Joerg
    Koenig, Daniel
    Weigel, Detlef
    Borgwardt, Karsten
    BMC GENOMICS, 2013, 14
  • [8] Local De Novo Assembly of RAD Paired-End Contigs Using Short Sequencing Reads
    Etter, Paul D.
    Preston, Jessica L.
    Bassham, Susan
    Cresko, William A.
    Johnson, Eric A.
    PLOS ONE, 2011, 6 (04):
  • [9] Mining structural variants of Heduo12 using paired-end reads
    Jia, Huiqiang
    Wei, Haicho
    Zhu, Daming
    Ma, Jingjing
    Yang, Hai
    Wang, Ruizhi
    Feng, Xianzhong
    2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, : 111 - 115
  • [10] Short paired-end reads trump long single-end reads for expression analysis
    Adam H. Freedman
    John M. Gaspar
    Timothy B. Sackton
    BMC Bioinformatics, 21