GPU-Accelerated Large-Scale Genome Assembly

被引:7
|
作者
Goswami, Sayan [1 ]
Lee, Kisung [1 ]
Shams, Shayan [1 ]
Park, Seung-Jong [1 ]
机构
[1] Louisiana State Univ, Div Comp Sci & Engn, Ctr Computat & Technol, Baton Rouge, LA 70803 USA
关键词
Genomics; Computational biology; Memory management; Big data; Parallel processing; SHORT-READ ALIGNMENT;
D O I
10.1109/IPDPS.2018.00091
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Spurred by a widening gap between hardware accelerators and traditional processors, numerous bioinformatics applications have harnessed the computing power of GPUs and reported substantial performance improvements compared to their CPU-based counterparts. However, most of these GPU-based applications only focus on the read alignment problem, while the field of de novo assembly still relies mostly on CPU-based solutions. This is primarily due to the nature of the assembly workload which is not only compute-intensive but also extremely data-intensive. Such workloads require large memories, making it difficult to adapt them to use GPUs with their limited memory capacities. To the best of our knowledge, no GPU-based assembler reported in the recent literature has attempted to assemble datasets larger than a few tens of gigabytes, whereas real sequence datasets are often several hundreds of gigabytes in size. In this paper, we present a new GPU-accelerated genome assembler called LaSAGNA, which can assemble large-scale sequence datasets using a single GPU by building string graphs from approximate all-pair overlaps. LaSAGNA can also run on multiple GPUs across multiple compute nodes connected by a high-speed network to expedite the assembly process. To utilize the limited memory on GPUs efficiently, LaSAGNA uses a semi-streaming approach that makes at most a logarithmic number of passes over the input data based on the available memory. Moreover, we propose a two-level streaming model, from disk to host memory and from host memory to device memory, to minimize disk I/O. Using LaSAGNA, we can assemble a 400 GB human genome dataset on a single NVIDIA K40 GPU in 17 hours, and in a little over 5 hours on an 8-node cluster of NVIDIA K20s.
引用
收藏
页码:814 / 824
页数:11
相关论文
共 50 条
  • [1] GALAMOST: GPU-accelerated large-scale molecular simulation toolkit
    Zhu, You-Liang
    Liu, Hong
    Li, Zhan-Wei
    Qian, Hu-Jun
    Milano, Giuseppe
    Lu, Zhong-Yuan
    [J]. JOURNAL OF COMPUTATIONAL CHEMISTRY, 2013, 34 (25) : 2197 - 2211
  • [2] GPU-accelerated and parallelized ELM ensembles for large-scale regression
    van Heeswijk, Mark
    Miche, Yoan
    Oja, Erkki
    Lendasse, Amaury
    [J]. NEUROCOMPUTING, 2011, 74 (16) : 2430 - 2437
  • [3] Towards GPU-Accelerated Large-Scale Graph Processing in the Cloud
    Zhong, Jianlong
    He, Bingsheng
    [J]. 2013 IEEE FIFTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), VOL 1, 2013, : 9 - 16
  • [4] IMGPU: GPU-Accelerated Influence Maximization in Large-Scale Social Networks
    Liu, Xiaodong
    Li, Mo
    Li, Shanshan
    Peng, Shaoliang
    Liao, Xiangke
    Lu, Xiaopei
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2014, 25 (01) : 136 - 145
  • [5] GPU-Accelerated Primal Learning for Extremely Fast Large-Scale Classification
    Halloran, John T.
    Rocke, David M.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [6] A GPU-accelerated Algorithm for Copy Move Detection in large-scale satellite images
    Barni, Mauro
    Costanzo, Andrea
    Dimitri, Giovanna Maria
    Tondi, Benedetta
    [J]. IMAGE AND SIGNAL PROCESSING FOR REMOTE SENSING XXIX, 2023, 12733
  • [7] GPU-Accelerated Developments for the Realistic Simulation of Large-Scale Mud/Debris Flows
    Martinez-Aranda, Sergio
    Garcia, Reinaldo
    Garcia-Navarro, Pilar
    [J]. PROCEEDINGS OF THE 39TH IAHR WORLD CONGRESS, 2022, : 4240 - 4249
  • [8] GPU-Accelerated Soft Error Rate Analysis of Large-Scale Integrated Circuits
    Sabet, M. Amin
    Ghavami, Behnam
    Raji, Mohsen
    [J]. IEEE DESIGN & TEST, 2018, 35 (06) : 78 - 85
  • [9] A GPU-Accelerated Integral-Equation Solution for Large-Scale Electromagnetic Problems
    Guan, Jian
    Yan, Su
    Jin, Jian-Ming
    [J]. 2014 USNC-URSI RADIO SCIENCE MEETING (JOINT WITH AP-S SYMPOSIUM), 2014, : 181 - 181
  • [10] GPU-accelerated PIR with Client-Independent Preprocessing for Large-Scale Applications
    Guenther, Daniel
    Heymann, Maurice
    Pinkas, Benny
    Schneider, Thomas
    [J]. PROCEEDINGS OF THE 31ST USENIX SECURITY SYMPOSIUM, 2022, : 1759 - 1776