Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers

被引:0
|
作者
Kanak Mahadik
Christopher Wright
Milind Kulkarni
Saurabh Bagchi
Somali Chaterji
机构
[1] Adobe Research,
[2] Purdue University,undefined
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Remarkable advancements in high-throughput gene sequencing technologies have led to an exponential growth in the number of sequenced genomes. However, unavailability of highly parallel and scalable de novo assembly algorithms have hindered biologists attempting to swiftly assemble high-quality complex genomes. Popular de Bruijn graph assemblers, such as IDBA-UD, generate high-quality assemblies by iterating over a set of k-values used in the construction of de Bruijn graphs (DBG). However, this process of sequentially iterating from small to large k-values slows down the process of assembly. In this paper, we propose ScalaDBG, which metamorphoses this sequential process, building DBGs for each distinct k-value in parallel. We develop an innovative mechanism to “patch” a higher k-valued graph with contigs generated from a lower k-valued graph. Moreover, ScalaDBG leverages multi-level parallelism, by both scaling up on all cores of a node, and scaling out to multiple nodes simultaneously. We demonstrate that ScalaDBG completes assembling the genome faster than IDBA-UD, but with similar accuracy on a variety of datasets (6.8X faster for one of the most complex genome in our dataset).
引用
收藏
相关论文
共 30 条
  • [1] Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers
    Mahadik, Kanak
    Wright, Christopher
    Kulkarni, Milind
    Bagchi, Saurabh
    Chaterji, Somali
    [J]. SCIENTIFIC REPORTS, 2019, 9 (1)
  • [2] Scalable Genomic Assembly through Parallel de Bruijn Graph Construction for Multiple K-mers
    Mahadik, Kanak
    Wright, Christopher
    Kulkarni, Milind
    Bagchi, Saurabh
    Chaterji, Somali
    [J]. ACM-BCB' 2017: PROCEEDINGS OF THE 8TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY,AND HEALTH INFORMATICS, 2017, : 425 - 431
  • [3] Parallel De Bruijn Graph Construction and Traversal for De Novo Genome Assembly
    Georganas, Evangelos
    Buluc, Aydin
    Chapman, Jarrod
    Oliker, Leonid
    Rokhsar, Daniel
    Yelick, Katherine
    [J]. SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 437 - 448
  • [4] De Novo Draft Genome Assembly Using Fuzzy K-mers
    Healy, John
    Chambers, Desmond
    [J]. BIOTECHNO 2011: THE THIRD INTERNATIONAL CONFERENCE ON BIOINFORMATICS, BIOCOMPUTATIONAL SYSTEMS AND BIOTECHNOLOGIES, 2011, : 104 - 109
  • [5] Parallelized De Bruijn graph construction and simplification for genome assembly
    [J]. Cheng, J.-F. (jiefengcheng@gmail.com), 1600, Chinese Academy of Sciences (24):
  • [6] Joker de Bruijn: Covering k-Mers Using Joker Characters
    Orenstein, Yaron
    Yu, Yun William
    Berger, Bonnie
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2018, 25 (11) : 1171 - 1178
  • [7] HaVec: An Efficient de Bruijn Graph Construction Algorithm for Genome Assembly
    Limon, Mahfuzer Rahman
    Sharker, Ratul
    Biswas, Sajib
    Rahman, M. Sohel
    [J]. INTERNATIONAL JOURNAL OF GENOMICS, 2017, 2017
  • [8] Genome Polymorphism Detection Through Relaxed de Bruijn Graph Construction
    Fujimoto, M. Stanley
    Lyman, Cole
    Suvorov, Anton
    Bodily, Paul
    Snell, Quinn
    Crandall, Keith
    Bybee, Seth
    Clement, Mark
    [J]. 2017 IEEE 17TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2017, : 212 - 216
  • [9] Joker de Bruijn: Sequence Libraries to Cover All k-mers Using Joker Characters
    Orenstein, Yaron
    Kim, Ryan
    Fordyce, Polly
    Berger, Bonnie
    [J]. RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB 2017, 2017, 10229 : 389 - 390
  • [10] deGSM: Memory Scalable Construction Of Large Scale de Bruijn Graph
    Guo, Hongzhe
    Fu, Yilei
    Gao, Yan
    Li, Junyi
    Wang, Yadong
    Liu, Bo
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2021, 18 (06) : 2157 - 2166