Alignment of High-Throughput Sequencing Data Inside In-Memory Databases

被引：3

作者：

Firnkorn, Daniel ^{[1
]}

Knaup-Gregori, Petra ^{[1
]}

Bermejo, Justo Lorenzo ^{[1
]}

Ganzinger, Matthias ^{[1
]}

机构：

[1] Inst Med Biometry & Informat, Heidelberg, Germany

来源：

E-HEALTH - FOR CONTINUITY OF CARE | 2014年 / 205卷

关键词：

In-Memory-Technology; DNA-Alignment; HANA; high-throughput sequencing; stored procedures;

D O I：

10.3233/978-1-61499-432-9-476

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.

引用

页码：476 / 480

页数：5

共 50 条

[21] Codon-Based Sequence Alignment for Mutation Analysis by High-Throughput Sequencing
Sze, Sing-Hoi
Kaplan, Craig D.
2018 IEEE 8TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL ADVANCES IN BIO AND MEDICAL SCIENCES (ICCABS), 2018,
[22] Learning from the data: Mining of large high-throughput screening databases
Yan, S. Frank
King, Frederick J.
He, Yun
Caldwell, Jeremy S.
Zhou, Yingyao
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2006, 46 (06) : 2381 - 2395
[23] Prevention, diagnosis and treatment of high-throughput sequencing data pathologies
Zhou, Xiaofan
Rokas, Antonis
MOLECULAR ECOLOGY, 2014, 23 (07) : 1679 - 1700
[24] AlmostSignificant: simplifying quality control of high-throughput sequencing data
Ward, Joseph
Cole, Christian
Febrer, Melanie
Barton, Geoffrey J.
BIOINFORMATICS, 2016, 32 (24) : 3850 - 3851
[25] Data structures and compression algorithms for high-throughput sequencing technologies
Kenny Daily
Paul Rigor
Scott Christley
Xiaohui Xie
Pierre Baldi
BMC Bioinformatics, 11
[26] HiTEC: accurate error correction in high-throughput sequencing data
Ilie, Lucian
Fazayeli, Farideh
Ilie, Silvana
BIOINFORMATICS, 2011, 27 (03) : 295 - 302
[27] fluff: exploratory analysis and visualization of high-throughput sequencing data
Georgiou, Georgios
van Heeringen, Simon J.
PEERJ, 2016, 4
[28] ReSeq simulates realistic Illumina high-throughput sequencing data
Schmeing, Stephan
Robinson, Mark D.
GENOME BIOLOGY, 2021, 22 (01)
[29] High-throughput DNA sequencing: A genomic data manufacturing process
Huang, GM
DNA SEQUENCE, 1999, 10 (03): : 149 - 153
[30] ReSeq simulates realistic Illumina high-throughput sequencing data
Stephan Schmeing
Mark D. Robinson
Genome Biology, 22

← 1 2 3 4 5 →