Alignment of High-Throughput Sequencing Data Inside In-Memory Databases

被引:3
|
作者
Firnkorn, Daniel [1 ]
Knaup-Gregori, Petra [1 ]
Bermejo, Justo Lorenzo [1 ]
Ganzinger, Matthias [1 ]
机构
[1] Inst Med Biometry & Informat, Heidelberg, Germany
来源
关键词
In-Memory-Technology; DNA-Alignment; HANA; high-throughput sequencing; stored procedures;
D O I
10.3233/978-1-61499-432-9-476
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.
引用
收藏
页码:476 / 480
页数:5
相关论文
共 50 条
  • [31] Model based heritability scores for high-throughput sequencing data
    Pratyaydipta Rudra
    W. Jenny Shi
    Brian Vestal
    Pamela H. Russell
    Aaron Odell
    Robin D. Dowell
    Richard A. Radcliffe
    Laura M. Saba
    Katerina Kechris
    BMC Bioinformatics, 18
  • [32] Efficient digest of high-throughput sequencing data in a reproducible report
    Zhang, Zhe
    Leipzig, Jeremy
    Sasson, Ariella
    Yu, Angela M.
    Perin, Juan C.
    Xie, Hongbo M.
    Sarmady, Mahdi
    Warren, Patrick V.
    White, Peter S.
    BMC BIOINFORMATICS, 2013, 14
  • [33] HaTSPiL: A modular pipeline for high-throughput sequencing data analysis
    Morandi, Edoardo
    Cereda, Matteo
    Incarnato, Danny
    Parlato, Caterina
    Basile, Giulia
    Anselmi, Francesca
    Lauria, Andrea
    Simon, Lisa Marie
    Polignano, Isabelle Laurence
    Arruga, Francesca
    Deaglio, Silvia
    Tirtei, Elisa
    Fagioli, Franca
    Oliviero, Salvatore
    PLOS ONE, 2019, 14 (10):
  • [34] Model based heritability scores for high-throughput sequencing data
    Rudra, Pratyaydipta
    Shi, W. Jenny
    Vestal, Brian
    Russell, Pamela H.
    Odell, Aaron
    Dowell, Robin D.
    Radcliffe, Richard A.
    Saba, Laura M.
    Kechris, Katerina
    BMC BIOINFORMATICS, 2017, 18
  • [35] Data structures and compression algorithms for high-throughput sequencing technologies
    Daily, Kenny
    Rigor, Paul
    Christley, Scott
    Xie, Xiaohui
    Baldi, Pierre
    BMC BIOINFORMATICS, 2010, 11
  • [36] QIIME allows analysis of high-throughput community sequencing data
    J Gregory Caporaso
    Justin Kuczynski
    Jesse Stombaugh
    Kyle Bittinger
    Frederic D Bushman
    Elizabeth K Costello
    Noah Fierer
    Antonio Gonzalez Peña
    Julia K Goodrich
    Jeffrey I Gordon
    Gavin A Huttley
    Scott T Kelley
    Dan Knights
    Jeremy E Koenig
    Ruth E Ley
    Catherine A Lozupone
    Daniel McDonald
    Brian D Muegge
    Meg Pirrung
    Jens Reeder
    Joel R Sevinsky
    Peter J Turnbaugh
    William A Walters
    Jeremy Widmann
    Tanya Yatsunenko
    Jesse Zaneveld
    Rob Knight
    Nature Methods, 2010, 7 : 335 - 336
  • [37] Detecting Alu insertions from high-throughput sequencing data
    David, Matei
    Mustafa, Harun
    Brudno, Michael
    NUCLEIC ACIDS RESEARCH, 2013, 41 (17)
  • [38] Quantifying selection in high-throughput Immunoglobulin sequencing data sets
    Yaari, Gur
    Uduman, Mohamed
    Kleinstein, Steven H.
    NUCLEIC ACIDS RESEARCH, 2012, 40 (17)
  • [39] Pyicos: a versatile toolkit for the analysis of high-throughput sequencing data
    Althammer, Sonja
    Gonzalez-Vallinas, Juan
    Ballare, Cecilia
    Beato, Miguel
    Eyras, Eduardo
    BIOINFORMATICS, 2011, 27 (24) : 3333 - 3340
  • [40] Efficient digest of high-throughput sequencing data in a reproducible report
    Zhe Zhang
    Jeremy Leipzig
    Ariella Sasson
    Angela M Yu
    Juan C Perin
    Hongbo M Xie
    Mahdi Sarmady
    Patrick V Warren
    Peter S White
    BMC Bioinformatics, 14