Compact representation of k-mer de Bruijn graphs for genome read assembly

被引:13
|
作者
Rodland, Einar Andreas
机构
[1] Univ Oslo, Ctr Canc Biomed, N-0316 Oslo, Norway
[2] Oslo Univ Hosp, Norwegian Radium Hosp, Inst Canc Res, Dept Tumor Biol, N-0424 Oslo, Norway
来源
BMC BIOINFORMATICS | 2013年 / 14卷
关键词
Memory Usage; Bloom Filter; Vertex Group; Read Error; Suffix Array;
D O I
10.1186/1471-2105-14-313
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Processing of reads from high throughput sequencing is often done in terms of edges in the de Bruijn graph representing all k-mers from the reads. The memory requirements for storing all k-mers in a lookup table can be demanding, even after removal of read errors, but can be alleviated by using a memory efficient data structure. Results: The FM-index, which is based on the Burrows-Wheeler transform, provides an efficient data structure providing a searchable index of all substrings from a set of strings, and is used to compactly represent full genomes for use in mapping reads to a genome: the memory required to store this is in the same order of magnitude as the strings themselves. However, reads from high throughput sequences mostly have high coverage and so contain the same substrings multiple times from different reads. I here present a modification of the FM-index, which I call the kFM-index, for indexing the set of k-mers from the reads. For DNA sequences, this requires 5 bit of information for each vertex of the corresponding de Bruijn subgraph, i.e. for each different k-1-mer, plus some additional overhead, typically 0.5 to 1 bit per vertex, for storing the equivalent of the FM-index for walking the underlying de Bruijn graph and reproducing the actual k-mers efficiently. Conclusions: The kFM-index could replace more memory demanding data structures for storing the de Bruijn k-mer graph representation of sequence reads. A Java implementation with additional technical documentation is provided which demonstrates the applicability of the data structure ( http://folk.uio.no/einarro/Projects/KFM-index/).
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Compact representation of k-mer de Bruijn graphs for genome read assembly
    Einar Andreas Rødland
    [J]. BMC Bioinformatics, 14
  • [2] K-mer Mapping and De Bruijn Graphs: the case for Velvet Fragment Assembly
    de Armas, Elvismary Molina
    Haeusler, Edward Hermann
    Lifschitz, Sergio
    de Holanda, Maristela Terto
    Cordeiro da Silva, Waldeyr Mendes
    Gomes Ferreira, Paulo Cavalcanti
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, : 882 - 889
  • [3] Enhanced Compression of k-Mer Sets with Counters via de Bruijn Graphs
    Rossignolo, Enrico
    Comin, Matteo
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2024, 31 (06) : 524 - 538
  • [4] How to apply de Bruijn graphs to genome assembly
    Phillip E C Compeau
    Pavel A Pevzner
    Glenn Tesler
    [J]. Nature Biotechnology, 2011, 29 : 987 - 991
  • [5] Integration of string and de Bruijn graphs for genome assembly
    Huang, Yao-Ting
    Liao, Chen-Fu
    [J]. BIOINFORMATICS, 2016, 32 (09) : 1301 - 1307
  • [6] How to apply de Bruijn graphs to genome assembly
    Compeau, Phillip E. C.
    Pevzner, Pavel A.
    Tesler, Glenn
    [J]. NATURE BIOTECHNOLOGY, 2011, 29 (11) : 987 - 991
  • [7] Informed and automated k-mer size selection for genome assembly
    Chikhi, Rayan
    Medvedev, Paul
    [J]. BIOINFORMATICS, 2014, 30 (01) : 31 - 37
  • [8] On the Representation of de Bruijn Graphs
    Chikhi, Rayan
    Limasset, Antoine
    Jackman, Shaun
    Simpson, Jared T.
    Medvedev, Paul
    [J]. RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB2014, 2014, 8394 : 35 - 55
  • [9] On the Representation of De Bruijn Graphs
    Chikhi, Rayan
    Limasset, Antoine
    Jackman, Shaun
    Simpson, Jared T.
    Medvedev, Paul
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2015, 22 (05) : 336 - 352
  • [10] Combining De Bruijn Graphs, Overlap Graphs and Microassembly for De Novo Genome Assembly
    Sergushichev, A. A.
    Alexandrov, A. V.
    Kazakov, S. V.
    Tsarev, F. N.
    Shalyto, A. A.
    [J]. IZVESTIYA SARATOVSKOGO UNIVERSITETA NOVAYA SERIYA-MATEMATIKA MEKHANIKA INFORMATIKA, 2013, 13 (02): : 10 - 10