A Parallel Algorithm for Compression of Big Next-Generation Sequencing Datasets

被引:1
|
作者
Perez, Sandino Vargas [1 ]
Saeed, Fahad [2 ]
机构
[1] Western Michigan Univ, Dept Comp Sci, Kalamazoo, MI 49008 USA
[2] Western Michigan Univ, Dept Elect & Comp Engn, Kalamazoo, MI 49008 USA
关键词
Next-Generation Sequencing; parallel implementation; DSRC; MPI; big data; FASTQ; FASTQ; FORMAT;
D O I
10.1109/Trustcom.2015.632
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The amount of big data from high-throughput Next-Generation Sequencing (NGS) techniques represents various challenges such as storage, analysis and transmission of massive datasets. One solution to storage and transmission of data is compression using specialized compression algorithms. The existing specialized algorithms suffer from poor scalability with increasing size of the datasets and best available solutions can take hours to compress gigabytes of data. Compression and decompression using these techniques for peta-scale data sets is prohibitively expensive in terms of time and energy. In this paper we introduce paraDSRC, a parallel implementation of the DNA Sequence Reads Compression (DSRC) application using a message passing model that presents reduction of the compression time complexity by a factor of O(1/p) (where p is the number of processing units). Our experimental results show that paraDSRC achieves compression times that are 43% to 99% faster than DSRC and compression throughputs of up to 8.4GB/s on a moderate size cluster. For many of the datasets used in our experiments super-linear speedups have been registered making the implementation strongly scalable. We also show that paraDSRC is more than 25.6x faster than comparable parallel compression algorithms.
引用
收藏
页码:196 / 201
页数:6
相关论文
共 50 条
  • [1] A Hybrid MPI-OpenMP Strategy to Speedup the Compression of Big Next-Generation Sequencing Datasets
    Vargas-Perez, Sandino
    Saeed, Fahad
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (10) : 2760 - 2769
  • [2] Next-generation sequencing to generate interactome datasets
    Yu H.
    Tardivo L.
    Tam S.
    Weiner E.
    Gebreab F.
    Fan C.
    Svrzikapa N.
    Hirozane-Kishikawa T.
    Rietman E.
    Yang X.
    Sahalie J.
    Salehi-Ashtiani K.
    Hao T.
    Cusick M.E.
    Hill D.E.
    Roth F.P.
    Braun P.
    Vidal M.
    Nature Methods, 2011, 8 (6) : 478 - 480
  • [3] Next-generation sequencing to generate interactome datasets
    Yu, Haiyuan
    Tardivo, Leah
    Tam, Stanley
    Weiner, Evan
    Gebreab, Fana
    Fan, Changyu
    Svrzikapa, Nenad
    Hirozane-Kishikawa, Tomoko
    Rietman, Edward
    Yang, Xinping
    Sahalie, Julie
    Salehi-Ashtiani, Kourosh
    Hao, Tong
    Cusick, Michael E.
    Hill, David E.
    Roth, Frederick P.
    Braun, Pascal
    Vidal, Marc
    NATURE METHODS, 2011, 8 (06) : 478 - U2257
  • [4] Compression of next-generation sequencing quality scores using memetic algorithm
    Jiarui Zhou
    Zhen Ji
    Zexuan Zhu
    Shan He
    BMC Bioinformatics, 15
  • [5] Compression of next-generation sequencing quality scores using memetic algorithm
    Zhou, Jiarui
    Ji, Zhen
    Zhu, Zexuan
    He, Shan
    BMC BIOINFORMATICS, 2014, 15
  • [6] How to account for the noise in next-generation sequencing datasets?
    Ocari, T.
    Nemoto, T.
    Planul, A.
    Tekinsoy, M.
    Zin, E. A.
    Dalkara, D.
    Ferrari, U.
    HUMAN GENE THERAPY, 2024, 35 (3-4) : A91 - A91
  • [7] Simulating Next-Generation Sequencing Datasets from Empirical Mutation and Sequencing Models
    Stephens, Zachary D.
    Hudson, Matthew E.
    Mainzer, Liudmila S.
    Taschuk, Morgan
    Weber, Matthew R.
    Iyer, Ravishankar K.
    PLOS ONE, 2016, 11 (11):
  • [8] Next-generation sequencing of the next generation
    Darren J. Burgess
    Nature Reviews Genetics, 2011, 12 : 78 - 79
  • [9] Next-generation sequencing for next-generation breeding, and more
    Tsai, Chung-Jui
    NEW PHYTOLOGIST, 2013, 198 (03) : 635 - 637
  • [10] Next-Generation Sequencing: Next-Generation Quality in Pediatrics
    Wortmann, Saskia B.
    Spenger, Johannes
    Preisel, Martin
    Koch, Johannes
    Rauscher, Christian
    Bader, Ingrid
    Mayr, Johannes A.
    Sperl, Wolfgang
    PADIATRIE UND PADOLOGIE, 2018, 53 (06): : 278 - 283