Cloud Parallel Processing of Tandem Mass Spectrometry Based Proteomics Data

被引：26

作者：

Mohammed, Yassene ^{[1
,2
,3
]}

Mostovenko, Ekaterina ^{[1
]}

Henneman, Alex A. ^{[1
]}

Marissen, Rob J. ^{[1
]}

Deelder, Andre M. ^{[1
]}

Palmblad, Magnus ^{[1
]}

机构：

[1] Leiden Univ, Dept Parasitol, Med Ctr, Biomol Mass Spectrometry Unit, NL-2300 RA Leiden, Netherlands

[2] Leibniz Univ Hannover, Distributed Comp Secur Grp, D-30167 Hannover, Germany

[3] Leibniz Univ Hannover, L3S, D-30167 Hannover, Germany

来源：

JOURNAL OF PROTEOME RESEARCH | 2012年 / 11卷 / 10期

关键词：

proteomics; mass spectrometry; scientific workflow; data decomposition; PEPTIDE IDENTIFICATION; SPECTRA; MAPREDUCE; SEQUENCES; XTANDEM; MS/MS; ETD;

D O I：

10.1021/pr300561q

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Data analysis in mass spectrometry based proteomics struggles to keep pace with the advances in instrumentation and the increasing rate of data acquisition. Analyzing this data involves multiple steps requiring diverse software, using different algorithms and data formats. Speed and performance of the mass spectral search engines are continuously improving, although not necessarily as needed to face the challenges of acquired big data. Improving and parallelizing the search algorithms is one possibility; data decomposition presents another, simpler strategy for introducing parallelism. We describe a general method for parallelizing identification of tandem mass spectra using data decomposition that keeps the search engine intact and wraps the parallelization around it. We introduce two algorithms for decomposing mzXML files and recomposing resulting pepXML files. This makes the approach applicable to different search engines, including those relying on sequence databases and those searching spectral libraries. We use cloud computing to deliver the computational power and scientific workflow engines to interface and automate the different processing steps. We show how to leverage these technologies to achieve faster data analysis in proteomics and present three scientific workflows for parallel database as well as spectral library search using our data decomposition programs, X!Tandem and SpectraST.

引用

页码：5101 / 5108

页数：8

共 50 条

[31] Proteomics: data analysis of mass spectrometry results
Vandenbrouck, Y
Garin, J
Jaquinod, M
Bruley, C
BIOFUTUR, 2005, (252) : 27 - 31
[32] Identification of contaminants in proteomics mass spectrometry data
Duncan, M
Fung, K
Wang, H
Yen, C
Cios, K
PROCEEDINGS OF THE 2003 IEEE BIOINFORMATICS CONFERENCE, 2003, : 409 - 410
[33] A parallel column based proteomic analysis by multidimensional chromatography and tandem mass Spectrometry
Zhang, Xiangmin
Zhang, Jie
Liu, Chunli
Mao, Yu
Yu, Wenjia
Wang, Yan
MOLECULAR & CELLULAR PROTEOMICS, 2004, 3 (10) : S144 - S144
[34] Mass spectrometry–based targeted proteomics
Allison Doerr
Nature Methods, 2013, 10 : 23 - 23
[35] Mass spectrometry-based proteomics
Aebersold, R
Mann, M
NATURE, 2003, 422 (6928) : 198 - 207
[36] A Mass Spectrometry Proteomics Data Management Platform
Sharma, Vagisha
Eng, Jimmy K.
MacCoss, Michael J.
Riffle, Michael
MOLECULAR & CELLULAR PROTEOMICS, 2012, 11 (09) : 824 - 831
[37] Preprocessing of mass spectrometry proteomics data on the grid
Cannataro, M
Guzzi, PH
Mazza, T
Tradigo, G
Veltri, P
18TH IEEE SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, PROCEEDINGS, 2005, : 549 - 554
[38] Data processing for mass spectrometry-based metabolomics
Katajamaa, Mikko
Oresic, Matej
JOURNAL OF CHROMATOGRAPHY A, 2007, 1158 (1-2) : 318 - 328
[39] Mass spectrometry data analysis in the proteomics era
Forner, Francesca
Foster, Leonard J.
Toppo, Stefano
CURRENT BIOINFORMATICS, 2007, 2 (01) : 63 - 93
[40] Mass spectrometry-based proteomics
Ruedi Aebersold
Matthias Mann
Nature, 2003, 422 : 198 - 207

← 1 2 3 4 5 →