A Parallel Algorithm for Compression of Big Next-Generation Sequencing Datasets

被引:1
|
作者
Perez, Sandino Vargas [1 ]
Saeed, Fahad [2 ]
机构
[1] Western Michigan Univ, Dept Comp Sci, Kalamazoo, MI 49008 USA
[2] Western Michigan Univ, Dept Elect & Comp Engn, Kalamazoo, MI 49008 USA
关键词
Next-Generation Sequencing; parallel implementation; DSRC; MPI; big data; FASTQ; FASTQ; FORMAT;
D O I
10.1109/Trustcom.2015.632
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The amount of big data from high-throughput Next-Generation Sequencing (NGS) techniques represents various challenges such as storage, analysis and transmission of massive datasets. One solution to storage and transmission of data is compression using specialized compression algorithms. The existing specialized algorithms suffer from poor scalability with increasing size of the datasets and best available solutions can take hours to compress gigabytes of data. Compression and decompression using these techniques for peta-scale data sets is prohibitively expensive in terms of time and energy. In this paper we introduce paraDSRC, a parallel implementation of the DNA Sequence Reads Compression (DSRC) application using a message passing model that presents reduction of the compression time complexity by a factor of O(1/p) (where p is the number of processing units). Our experimental results show that paraDSRC achieves compression times that are 43% to 99% faster than DSRC and compression throughputs of up to 8.4GB/s on a moderate size cluster. For many of the datasets used in our experiments super-linear speedups have been registered making the implementation strongly scalable. We also show that paraDSRC is more than 25.6x faster than comparable parallel compression algorithms.
引用
收藏
页码:196 / 201
页数:6
相关论文
共 50 条
  • [41] Combinatorics and next-generation sequencing
    Patterson, Nick
    Gabriel, Stacey
    NATURE BIOTECHNOLOGY, 2009, 27 (09) : 826 - 827
  • [42] Next-Generation Sequencing in Cancer
    S. Vinod Nair
    Gigi Madhulaxmi
    Ravindran Thomas
    Journal of Maxillofacial and Oral Surgery, 2021, 20 : 340 - 344
  • [43] Next-generation sequencing in ophthalmology
    Wolf, Julian
    Lange, Clemens
    Reinhard, Thomas
    Schlunck, Guenther
    SPEKTRUM DER AUGENHEILKUNDE, 2024, 38 (06) : 260 - 270
  • [44] Next-Generation Sequencing Strategies
    Levy, Shawn E.
    Boone, Braden E.
    COLD SPRING HARBOR PERSPECTIVES IN MEDICINE, 2019, 9 (07):
  • [45] NEXT-GENERATION SEQUENCING, THE BASICS
    Corbett, Mark
    CLINICAL AND EXPERIMENTAL OPHTHALMOLOGY, 2011, 39 : 89 - 89
  • [46] Next-Generation Sequencing Platforms
    Mardis, Elaine R.
    ANNUAL REVIEW OF ANALYTICAL CHEMISTRY, VOL 6, 2013, 6 : 287 - 303
  • [47] Combinatorics and next-generation sequencing
    Nick Patterson
    Stacey Gabriel
    Nature Biotechnology, 2009, 27 : 826 - 827
  • [48] Next-generation sequencing and norovirus
    Cotten, Matthew
    Koopmans, Marion
    FUTURE VIROLOGY, 2016, 11 (11) : 719 - 722
  • [49] Next-Generation Sequencing in Cancer
    Nair, S. Vinod
    Madhulaxmi
    Thomas, Gigi
    Ankathil, Ravindran
    JOURNAL OF MAXILLOFACIAL & ORAL SURGERY, 2021, 20 (03): : 340 - 344
  • [50] Next-generation sequencing in the clinic
    Park, Jason Y.
    Kricka, Larry J.
    Fortina, Paolo
    NATURE BIOTECHNOLOGY, 2013, 31 (11) : 990 - 992