Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly

被引:4
|
作者
Guidi, Giulia [1 ,2 ]
Selvitopi, Oguz [2 ]
Ellis, Marquita [1 ,2 ]
Oliker, Leonid [2 ]
Yelick, Katherine [1 ,2 ]
Buluc, Aydm [1 ,2 ]
机构
[1] Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA
[2] Lawrence Berkeley Natl Lab, Computat Res Div, Berkeley, CA 94720 USA
关键词
D O I
10.1109/IPDPS49936.2021.00060
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
One of the most computationally intensive tasks in computational biology is de novo genome assembly, the decoding of the sequence of an unknown genome from redundant and erroneous short sequences. A common assembly paradigm identifies overlapping sequences, simplifies their layout, and creates consensus. Despite many algorithms developed in the literature, the efficient assembly of large genomes is still an open problem. In this work, we introduce new distributed-memory parallel algorithms for overlap detection and layout simplification steps of de novo genome assembly, and implement them in the diBELLA 2D pipeline. Our distributed memory algorithms for both overlap detection and layout simplification are based on linear-algebra operations over semirings using 2D distributed sparse matrices. Our layout step consists of performing a transitive reduction from the overlap graph to a string graph. We provide a detailed communication analysis of the main stages of our new algorithms. diBELLA 2D achieves near linear scaling with over 80% parallel efficiency for the human genome, reducing the runtime for overlap detection by 1.2-1.3x for the human genome and 1.5-1.9x for C.elegans compared to the state-of-the-art. Our transitive reduction algorithm outperforms an existing distributed-memory implementation by 10.5-13.3x for the human genome and 18-29x for the C. elegans. Our work paves the way for efficient de novo assembly of large genomes using long reads in distributed memory.
引用
收藏
页码:517 / 526
页数:10
相关论文
共 50 条
  • [1] Parallel De Bruijn Graph Construction and Traversal for De Novo Genome Assembly
    Georganas, Evangelos
    Buluc, Aydin
    Chapman, Jarrod
    Oliker, Leonid
    Rokhsar, Daniel
    Yelick, Katherine
    [J]. SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 437 - 448
  • [2] FSG: Fast String Graph Construction for De Novo Assembly
    Bonizzoni, Paola
    Della Vedova, Gianluca
    Pirola, Yuri
    Previtali, Marco
    Rizzi, Raffaella
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2017, 24 (10) : 953 - 968
  • [3] FSG: Fast String Graph Construction for De Novo Assembly of Reads Data
    Bonizzoni, Paola
    Della Vedova, Gianluca
    Pirola, Yuri
    Previtali, Marco
    Rizzi, Raffaella
    [J]. BIOINFORMATICS RESEARCH AND APPLICATIONS, ISBRA 2016, 2016, 9683 : 27 - 39
  • [4] Faucet: streaming de novo assembly graph construction
    Rozov, Roye
    Goldshlager, Gil
    Halperin, Eran
    Shamir, Ron
    [J]. BIOINFORMATICS, 2018, 34 (01) : 147 - 154
  • [5] Scalable De Novo Genome Assembly Using a Pregel-Like Graph-Parallel System
    Guo, Guimu
    Chen, Hongzhi
    Yan, Da
    Cheng, James
    Chen, Jake Y.
    Chong, Zechen
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2021, 18 (02) : 731 - 744
  • [6] A New Approach for De Bruijn Graph Construction in De Novo Genome Assembling
    de Armas, Elvismary Molina
    Castro, Liester Cruz
    Holanda, Maristela
    Lifschitz, Sergio
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 1842 - 1849
  • [7] Performance Characterization of De Novo Genome Assembly on Leading Parallel Systems
    Ellis, Marquita
    Georganas, Evangelos
    Egan, Rob
    Hofmeyr, Steven
    Buluc, Aydin
    Cook, Brandon
    Oliker, Leonid
    Yelick, Katherine
    [J]. EURO-PAR 2017: PARALLEL PROCESSING, 2017, 10417 : 79 - 91
  • [8] Parallelized De Bruijn graph construction and simplification for genome assembly
    [J]. Cheng, J.-F. (jiefengcheng@gmail.com), 1600, Chinese Academy of Sciences (24):
  • [9] Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers
    Mahadik, Kanak
    Wright, Christopher
    Kulkarni, Milind
    Bagchi, Saurabh
    Chaterji, Somali
    [J]. SCIENTIFIC REPORTS, 2019, 9 (1)
  • [10] Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers
    Kanak Mahadik
    Christopher Wright
    Milind Kulkarni
    Saurabh Bagchi
    Somali Chaterji
    [J]. Scientific Reports, 9