Alignment of High-Throughput Sequencing Data Inside In-Memory Databases

被引：3

作者：

Firnkorn, Daniel ^{[1
]}

Knaup-Gregori, Petra ^{[1
]}

Bermejo, Justo Lorenzo ^{[1
]}

Ganzinger, Matthias ^{[1
]}

机构：

[1] Inst Med Biometry & Informat, Heidelberg, Germany

来源：

E-HEALTH - FOR CONTINUITY OF CARE | 2014年 / 205卷

关键词：

In-Memory-Technology; DNA-Alignment; HANA; high-throughput sequencing; stored procedures;

D O I：

10.3233/978-1-61499-432-9-476

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.

引用

页码：476 / 480

页数：5

共 50 条

[1] AritPIM: High-Throughput In-Memory Arithmetic
Leitersdorf O.
Leitersdorf D.
Gal J.
Dahan M.
Ronen R.
Kvatinsky S.
IEEE Transactions on Emerging Topics in Computing, 2023, 11 (03): : 720 - 735
[2] SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data
Abuin, Jose M.
Pichel, Juan C.
Pena, Tomas F.
Amigo, Jorge
PLOS ONE, 2016, 11 (05):
[3] A novel multi-alignment pipeline for high-throughput sequencing data
Huang, Shunping
Holt, James
Kao, Chia-Yu
McMillan, Leonard
Wang, Wei
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2014,
[4] Genome reassembly with high-throughput sequencing data
Parrish, Nathaniel
Sudakov, Benjamin
Eskin, Eleazar
BMC GENOMICS, 2013, 14
[5] Tools for mapping high-throughput sequencing data
Fonseca, Nuno A.
Rung, Johan
Brazma, Alvis
Marioni, John C.
BIOINFORMATICS, 2012, 28 (24) : 3169 - 3177
[6] Genome reassembly with high-throughput sequencing data
Nathaniel Parrish
Benjamin Sudakov
Eleazar Eskin
BMC Genomics, 14
[7] Compression of Structured High-Throughput Sequencing Data
Campagne, Fabien
Dorff, Kevin C.
Chambwe, Nyasha
Robinson, James T.
Mesirov, Jill P.
PLOS ONE, 2013, 8 (11):
[8] A High-Throughput In-Memory Index, Durable on Flash-based SSD
Kissinger, Thomas
Schlegel, Benjamin
Boehm, Matthias
Habich, Dirk
Lehner, Wolfgang
SIGMOD RECORD, 2012, 41 (03) : 44 - 50
[9] FourierPIM: High-throughput in-memory Fast Fourier Transform and polynomial multiplication
Leitersdorf, Orian
Boneh, Yahav
Gazit, Gonen
Ronen, Ronny
Kvatinsky, Shahar
Memories - Materials, Devices, Circuits and Systems, 2023, 4
[10] Comparison of high-throughput sequencing data compression tools
Numanagic, Ibrahim
Bonfield, James K.
Hach, Faraz
Voges, Jan
Ostermann, Joern
Alberti, Claudio
Mattavelli, Marco
Sahinalp, S. Cenk
NATURE METHODS, 2016, 13 (12) : 1005 - +

← 1 2 3 4 5 →