Cloud Parallel Processing of Tandem Mass Spectrometry Based Proteomics Data

被引:26
|
作者
Mohammed, Yassene [1 ,2 ,3 ]
Mostovenko, Ekaterina [1 ]
Henneman, Alex A. [1 ]
Marissen, Rob J. [1 ]
Deelder, Andre M. [1 ]
Palmblad, Magnus [1 ]
机构
[1] Leiden Univ, Dept Parasitol, Med Ctr, Biomol Mass Spectrometry Unit, NL-2300 RA Leiden, Netherlands
[2] Leibniz Univ Hannover, Distributed Comp Secur Grp, D-30167 Hannover, Germany
[3] Leibniz Univ Hannover, L3S, D-30167 Hannover, Germany
关键词
proteomics; mass spectrometry; scientific workflow; data decomposition; PEPTIDE IDENTIFICATION; SPECTRA; MAPREDUCE; SEQUENCES; XTANDEM; MS/MS; ETD;
D O I
10.1021/pr300561q
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Data analysis in mass spectrometry based proteomics struggles to keep pace with the advances in instrumentation and the increasing rate of data acquisition. Analyzing this data involves multiple steps requiring diverse software, using different algorithms and data formats. Speed and performance of the mass spectral search engines are continuously improving, although not necessarily as needed to face the challenges of acquired big data. Improving and parallelizing the search algorithms is one possibility; data decomposition presents another, simpler strategy for introducing parallelism. We describe a general method for parallelizing identification of tandem mass spectra using data decomposition that keeps the search engine intact and wraps the parallelization around it. We introduce two algorithms for decomposing mzXML files and recomposing resulting pepXML files. This makes the approach applicable to different search engines, including those relying on sequence databases and those searching spectral libraries. We use cloud computing to deliver the computational power and scientific workflow engines to interface and automate the different processing steps. We show how to leverage these technologies to achieve faster data analysis in proteomics and present three scientific workflows for parallel database as well as spectral library search using our data decomposition programs, X!Tandem and SpectraST.
引用
收藏
页码:5101 / 5108
页数:8
相关论文
共 50 条
  • [21] Maximizing Tandem Mass Spectrometry Acquisition Rates for Shotgun Proteomics
    Trujillo, Edna A.
    Hebert, Alexander S.
    Brademan, Dain R.
    Coon, Joshua J.
    ANALYTICAL CHEMISTRY, 2019, 91 (20) : 12625 - 12629
  • [22] Automated comparative proteomics based on multiplex tandem mass spectrometry and stable isotope labeling
    Zhang, GA
    Neubert, TA
    MOLECULAR & CELLULAR PROTEOMICS, 2006, 5 (02) : 401 - 411
  • [23] Nearline acquisition and processing of liquid chromatography-tandem mass spectrometry data
    Neumann, Steffen
    Thum, Andrea
    Boettcher, Christoph
    METABOLOMICS, 2013, 9 (01) : S84 - S91
  • [24] Decision tree–driven tandem mass spectrometry for shotgun proteomics
    Danielle L Swaney
    Graeme C McAlister
    Joshua J Coon
    Nature Methods, 2008, 5 : 959 - 964
  • [25] Nearline acquisition and processing of liquid chromatography-tandem mass spectrometry data
    Steffen Neumann
    Andrea Thum
    Christoph Böttcher
    Metabolomics, 2013, 9 : 84 - 91
  • [26] EpiProfile 2.0: A Computational Platform for Processing Epi-Proteomics Mass Spectrometry Data
    Yuan, Zuo-Fei
    Sidoli, Simone
    Marchione, Dylan M.
    Simithy, Johayra
    Janssen, Kevin A.
    Szurgot, Mary R.
    Garcia, Benjamin A.
    JOURNAL OF PROTEOME RESEARCH, 2018, 17 (07) : 2533 - 2541
  • [27] A bioinformatics approach for mass spectrometry data processing: Applications to proteomics and small molecule analysis
    Sonderegger, M
    Staniszewski, K
    Meyers, A
    Siuzdak, G
    SPECTROSCOPY-AN INTERNATIONAL JOURNAL, 2002, 16 (02): : 81 - 87
  • [28] Bioinformatics Methods for Mass Spectrometry-Based Proteomics Data Analysis
    Chen, Chen
    Hou, Jie
    Tanner, John J.
    Cheng, Jianlin
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2020, 21 (08)
  • [29] The mzIdentML Data Standard for Mass Spectrometry-Based Proteomics Results
    Jones, Andrew R.
    Eisenacher, Martin
    Mayer, Gerhard
    Kohlbacher, Oliver
    Siepen, Jennifer
    Hubbard, Simon J.
    Selley, Julian N.
    Searle, Brian C.
    Shofstahl, James
    Seymour, Sean L.
    Julian, Randall
    Binz, Pierre-Alain
    Deutsch, Eric W.
    Hermjakob, Henning
    Reisinger, Florian
    Griss, Johannes
    Vizcaino, Juan Antonio
    Chambers, Matthew
    Pizarro, Angel
    Creasy, David
    MOLECULAR & CELLULAR PROTEOMICS, 2012, 11 (07)
  • [30] Bioinformatics analysis of mass spectrometry-based proteomics data sets
    Kumar, Chanchal
    Mann, Matthias
    FEBS LETTERS, 2009, 583 (11) : 1703 - 1712