Alignment of High-Throughput Sequencing Data Inside In-Memory Databases

被引:3
|
作者
Firnkorn, Daniel [1 ]
Knaup-Gregori, Petra [1 ]
Bermejo, Justo Lorenzo [1 ]
Ganzinger, Matthias [1 ]
机构
[1] Inst Med Biometry & Informat, Heidelberg, Germany
来源
关键词
In-Memory-Technology; DNA-Alignment; HANA; high-throughput sequencing; stored procedures;
D O I
10.3233/978-1-61499-432-9-476
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.
引用
收藏
页码:476 / 480
页数:5
相关论文
共 50 条
  • [1] AritPIM: High-Throughput In-Memory Arithmetic
    Leitersdorf O.
    Leitersdorf D.
    Gal J.
    Dahan M.
    Ronen R.
    Kvatinsky S.
    IEEE Transactions on Emerging Topics in Computing, 2023, 11 (03): : 720 - 735
  • [2] SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data
    Abuin, Jose M.
    Pichel, Juan C.
    Pena, Tomas F.
    Amigo, Jorge
    PLOS ONE, 2016, 11 (05):
  • [3] A novel multi-alignment pipeline for high-throughput sequencing data
    Huang, Shunping
    Holt, James
    Kao, Chia-Yu
    McMillan, Leonard
    Wang, Wei
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2014,
  • [4] Genome reassembly with high-throughput sequencing data
    Parrish, Nathaniel
    Sudakov, Benjamin
    Eskin, Eleazar
    BMC GENOMICS, 2013, 14
  • [5] Tools for mapping high-throughput sequencing data
    Fonseca, Nuno A.
    Rung, Johan
    Brazma, Alvis
    Marioni, John C.
    BIOINFORMATICS, 2012, 28 (24) : 3169 - 3177
  • [6] Genome reassembly with high-throughput sequencing data
    Nathaniel Parrish
    Benjamin Sudakov
    Eleazar Eskin
    BMC Genomics, 14
  • [7] Compression of Structured High-Throughput Sequencing Data
    Campagne, Fabien
    Dorff, Kevin C.
    Chambwe, Nyasha
    Robinson, James T.
    Mesirov, Jill P.
    PLOS ONE, 2013, 8 (11):
  • [8] A High-Throughput In-Memory Index, Durable on Flash-based SSD
    Kissinger, Thomas
    Schlegel, Benjamin
    Boehm, Matthias
    Habich, Dirk
    Lehner, Wolfgang
    SIGMOD RECORD, 2012, 41 (03) : 44 - 50
  • [9] FourierPIM: High-throughput in-memory Fast Fourier Transform and polynomial multiplication
    Leitersdorf, Orian
    Boneh, Yahav
    Gazit, Gonen
    Ronen, Ronny
    Kvatinsky, Shahar
    Memories - Materials, Devices, Circuits and Systems, 2023, 4
  • [10] Comparison of high-throughput sequencing data compression tools
    Numanagic, Ibrahim
    Bonfield, James K.
    Hach, Faraz
    Voges, Jan
    Ostermann, Joern
    Alberti, Claudio
    Mattavelli, Marco
    Sahinalp, S. Cenk
    NATURE METHODS, 2016, 13 (12) : 1005 - +