Alignment of High-Throughput Sequencing Data Inside In-Memory Databases

被引：3

作者：

Firnkorn, Daniel ^{[1
]}

Knaup-Gregori, Petra ^{[1
]}

Bermejo, Justo Lorenzo ^{[1
]}

Ganzinger, Matthias ^{[1
]}

机构：

[1] Inst Med Biometry & Informat, Heidelberg, Germany

来源：

E-HEALTH - FOR CONTINUITY OF CARE | 2014年 / 205卷

关键词：

In-Memory-Technology; DNA-Alignment; HANA; high-throughput sequencing; stored procedures;

D O I：

10.3233/978-1-61499-432-9-476

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.

引用

页码：476 / 480

页数：5

共 50 条

[31] Model based heritability scores for high-throughput sequencing data
Pratyaydipta Rudra
W. Jenny Shi
Brian Vestal
Pamela H. Russell
Aaron Odell
Robin D. Dowell
Richard A. Radcliffe
Laura M. Saba
Katerina Kechris
BMC Bioinformatics, 18
[32] Efficient digest of high-throughput sequencing data in a reproducible report
Zhang, Zhe
Leipzig, Jeremy
Sasson, Ariella
Yu, Angela M.
Perin, Juan C.
Xie, Hongbo M.
Sarmady, Mahdi
Warren, Patrick V.
White, Peter S.
BMC BIOINFORMATICS, 2013, 14
[33] HaTSPiL: A modular pipeline for high-throughput sequencing data analysis
Morandi, Edoardo
Cereda, Matteo
Incarnato, Danny
Parlato, Caterina
Basile, Giulia
Anselmi, Francesca
Lauria, Andrea
Simon, Lisa Marie
Polignano, Isabelle Laurence
Arruga, Francesca
Deaglio, Silvia
Tirtei, Elisa
Fagioli, Franca
Oliviero, Salvatore
PLOS ONE, 2019, 14 (10):
[34] Model based heritability scores for high-throughput sequencing data
Rudra, Pratyaydipta
Shi, W. Jenny
Vestal, Brian
Russell, Pamela H.
Odell, Aaron
Dowell, Robin D.
Radcliffe, Richard A.
Saba, Laura M.
Kechris, Katerina
BMC BIOINFORMATICS, 2017, 18
[35] Data structures and compression algorithms for high-throughput sequencing technologies
Daily, Kenny
Rigor, Paul
Christley, Scott
Xie, Xiaohui
Baldi, Pierre
BMC BIOINFORMATICS, 2010, 11
[36] QIIME allows analysis of high-throughput community sequencing data
J Gregory Caporaso
Justin Kuczynski
Jesse Stombaugh
Kyle Bittinger
Frederic D Bushman
Elizabeth K Costello
Noah Fierer
Antonio Gonzalez Peña
Julia K Goodrich
Jeffrey I Gordon
Gavin A Huttley
Scott T Kelley
Dan Knights
Jeremy E Koenig
Ruth E Ley
Catherine A Lozupone
Daniel McDonald
Brian D Muegge
Meg Pirrung
Jens Reeder
Joel R Sevinsky
Peter J Turnbaugh
William A Walters
Jeremy Widmann
Tanya Yatsunenko
Jesse Zaneveld
Rob Knight
Nature Methods, 2010, 7 : 335 - 336
[37] Detecting Alu insertions from high-throughput sequencing data
David, Matei
Mustafa, Harun
Brudno, Michael
NUCLEIC ACIDS RESEARCH, 2013, 41 (17)
[38] Quantifying selection in high-throughput Immunoglobulin sequencing data sets
Yaari, Gur
Uduman, Mohamed
Kleinstein, Steven H.
NUCLEIC ACIDS RESEARCH, 2012, 40 (17)
[39] Pyicos: a versatile toolkit for the analysis of high-throughput sequencing data
Althammer, Sonja
Gonzalez-Vallinas, Juan
Ballare, Cecilia
Beato, Miguel
Eyras, Eduardo
BIOINFORMATICS, 2011, 27 (24) : 3333 - 3340
[40] Efficient digest of high-throughput sequencing data in a reproducible report
Zhe Zhang
Jeremy Leipzig
Ariella Sasson
Angela M Yu
Juan C Perin
Hongbo M Xie
Mahdi Sarmady
Patrick V Warren
Peter S White
BMC Bioinformatics, 14

← 1 2 3 4 5 →