On the Representation of De Bruijn Graphs

被引:29
|
作者
Chikhi, Rayan [1 ,6 ]
Limasset, Antoine [3 ]
Jackman, Shaun [4 ]
Simpson, Jared T. [5 ]
Medvedev, Paul [1 ,2 ,6 ]
机构
[1] Penn State Univ, Dept Comp Sci & Engn, State Coll, PA USA
[2] Penn State Univ, Dept Biochem & Mol Biol, State Coll, PA USA
[3] ENS Cachan Brittany, Bruz, France
[4] Canadas Michael Smith Genome Sci Ctr, Vancouver, BC, Canada
[5] Ontario Inst Canc Res, Toronto, ON, Canada
[6] Penn State Univ, Genome Sci Inst Huck, State Coll, PA USA
基金
美国国家科学基金会;
关键词
SHORT READ ALIGNMENT; LARGE GENOMES; SEQUENCE DATA; ASSEMBLIES; ALGORITHMS;
D O I
10.1089/cmb.2014.0160
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The de Bruijn graph plays an important role in bioinformatics, especially in the context of de novo assembly. However, the representation of the de Bruijn graph in memory is a computational bottleneck for many assemblers. Recent papers proposed a navigational data structure approach in order to improve memory usage. We prove several theoretical space lower bounds to show the limitations of these types of approaches. We further design and implement a general data structure (dbgfm) and demonstrate its use on a human whole-genome dataset, achieving space usage of 1.5 GB and a 46% improvement over previous approaches. As part of dbgfm, we develop the notion of frequency-based minimizers and show how it can be used to enumerate all maximal simple paths of the de Bruijn graph using only 43 MB of memory. Finally, we demonstrate that our approach can be integrated into an existing assembler by modifying the ABySS software to use dbgfm.
引用
收藏
页码:336 / 352
页数:17
相关论文
共 50 条
  • [1] On the Representation of de Bruijn Graphs
    Chikhi, Rayan
    Limasset, Antoine
    Jackman, Shaun
    Simpson, Jared T.
    Medvedev, Paul
    [J]. RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB2014, 2014, 8394 : 35 - 55
  • [2] Simplitigs as an efficient and scalable representation of de Bruijn graphs
    Karel Břinda
    Michael Baym
    Gregory Kucherov
    [J]. Genome Biology, 22
  • [3] Simplitigs as an efficient and scalable representation of de Bruijn graphs
    Brinda, Karel
    Baym, Michael
    Kucherov, Gregory
    [J]. GENOME BIOLOGY, 2021, 22 (01)
  • [4] De Bruijn sequences and De Bruijn graphs for a general language
    Moreno, E
    [J]. INFORMATION PROCESSING LETTERS, 2005, 96 (06) : 214 - 219
  • [5] Generalized de Bruijn graphs
    Malyshev, FM
    Tarakanov, VE
    [J]. MATHEMATICAL NOTES, 1997, 62 (3-4) : 449 - 456
  • [6] Generalized de Bruijn graphs
    F. M. Malyshev
    V. E. Tarakanov
    [J]. Mathematical Notes, 1997, 62 : 449 - 456
  • [7] Enhanced de Bruijn graphs
    Guzide, O
    Wagh, MD
    [J]. AMCS '05: PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON ALGORITHMIC MATHEMATICS AND COMPUTER SCIENCE, 2005, : 23 - 28
  • [8] Shifted de Bruijn Graphs
    Freij, Ragnar
    [J]. CODING THEORY AND APPLICATIONS, 4TH INTERNATIONAL CASTLE MEETING, 2015, 3 : 195 - 202
  • [9] Manifold de Bruijn Graphs
    Lin, Yu
    Pevzner, Pavel A.
    [J]. ALGORITHMS IN BIOINFORMATICS, 2014, 8701 : 296 - 310
  • [10] Bijections in de Bruijn Graphs
    Rukavicka, Josef
    [J]. ARS COMBINATORIA, 2019, 143 : 215 - 226