Cloud Parallel Processing of Tandem Mass Spectrometry Based Proteomics Data

被引:26
|
作者
Mohammed, Yassene [1 ,2 ,3 ]
Mostovenko, Ekaterina [1 ]
Henneman, Alex A. [1 ]
Marissen, Rob J. [1 ]
Deelder, Andre M. [1 ]
Palmblad, Magnus [1 ]
机构
[1] Leiden Univ, Dept Parasitol, Med Ctr, Biomol Mass Spectrometry Unit, NL-2300 RA Leiden, Netherlands
[2] Leibniz Univ Hannover, Distributed Comp Secur Grp, D-30167 Hannover, Germany
[3] Leibniz Univ Hannover, L3S, D-30167 Hannover, Germany
关键词
proteomics; mass spectrometry; scientific workflow; data decomposition; PEPTIDE IDENTIFICATION; SPECTRA; MAPREDUCE; SEQUENCES; XTANDEM; MS/MS; ETD;
D O I
10.1021/pr300561q
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Data analysis in mass spectrometry based proteomics struggles to keep pace with the advances in instrumentation and the increasing rate of data acquisition. Analyzing this data involves multiple steps requiring diverse software, using different algorithms and data formats. Speed and performance of the mass spectral search engines are continuously improving, although not necessarily as needed to face the challenges of acquired big data. Improving and parallelizing the search algorithms is one possibility; data decomposition presents another, simpler strategy for introducing parallelism. We describe a general method for parallelizing identification of tandem mass spectra using data decomposition that keeps the search engine intact and wraps the parallelization around it. We introduce two algorithms for decomposing mzXML files and recomposing resulting pepXML files. This makes the approach applicable to different search engines, including those relying on sequence databases and those searching spectral libraries. We use cloud computing to deliver the computational power and scientific workflow engines to interface and automate the different processing steps. We show how to leverage these technologies to achieve faster data analysis in proteomics and present three scientific workflows for parallel database as well as spectral library search using our data decomposition programs, X!Tandem and SpectraST.
引用
收藏
页码:5101 / 5108
页数:8
相关论文
共 50 条
  • [1] MIC-Tandem: parallel X!Tandem Using MIC on Tandem Mass Spectrometry Based Proteomics Data
    He, Pinjie
    Li, Kenli
    2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 717 - 720
  • [2] Parallel sample processing for mass spectrometry-based single cell proteomics
    Wang, Jing
    Xue, Bo
    Awoyemi, Olanrewaju
    Yuliantoro, Herbi
    Mendis, Lihini Tharanga
    DeVor, Amanda
    Valentine, Stephen J.
    Li, Peng
    ANALYTICA CHIMICA ACTA, 2024, 1329
  • [3] Grid-based Analysis of Tandem Mass Spectrometry Data in Clinical Proteomics
    Quandt, Andreas
    Hernandez, Patricia
    Kunzst, Peter
    Pautasso, Cesare
    Tuloup, Marc
    Hernandez, Celine
    Appel, Ron D.
    FROM GENES TO PERSONALIZED HEALTHCARE: GRID SOLUTIONS FOR THE LIFE SCIENCES, 2007, 126 : 13 - +
  • [4] Data analysis and bioinformatics tools for tandem mass spectrometry in proteomics
    Deutsch, Eric W.
    Lam, Henry
    Aebersold, Ruedi
    PHYSIOLOGICAL GENOMICS, 2008, 33 (01) : 18 - 25
  • [5] ECD/ETD-based Tandem Mass Spectrometry in Proteomics
    Sun Rui-Xiang
    Dong Meng-Qiu
    Chi Hao
    Yang Bing
    Xiu Li-Yun
    Wang Le-Heng
    Fu Yan
    He Si-Min
    PROGRESS IN BIOCHEMISTRY AND BIOPHYSICS, 2010, 37 (01) : 94 - 102
  • [6] Preview: A Program for Surveying Shotgun Proteomics Tandem Mass Spectrometry Data
    Kil, Yong J.
    Becker, Christopher
    Sandoval, Wendy
    Godberg, David
    Bern, Marshall
    ANALYTICAL CHEMISTRY, 2011, 83 (13) : 5259 - 5267
  • [7] Cloud IaaS for Mass Spectrometry and Proteomics
    Judson, Brenden
    McGrath, Garre S.
    Peuchen, Elizabeth H.
    Champion, S. Hew M.
    Brenner, Paul
    SCIENCECLOUD'17: PROCEEDINGS OF THE 8TH WORKSHOP ON SCIENTIFIC CLOUD COMPUTING, 2017, : 17 - 24
  • [8] High-performance peptide identification by tandem mass spectrometry allows reliable automatic data processing in proteomics
    Colinge, J
    Masselot, A
    Cusin, I
    Mahé, E
    Niknejad, A
    Argoud-Puy, G
    Reffas, S
    Bederr, N
    Gleizes, A
    Rey, PA
    Bougueleret, L
    PROTEOMICS, 2004, 4 (07) : 1977 - 1984
  • [9] Data Preprocessing and Filtering in Mass Spectrometry Based Proteomics
    Reiz, Beata
    Kertesz-Farkas, Attila
    Pongor, Sandor
    Myers, Michael P.
    CURRENT BIOINFORMATICS, 2012, 7 (02) : 212 - 220
  • [10] Data pre-processing in liquid chromatography-mass spectrometry-based proteomics
    Zhang, XA
    Asara, JM
    Adamec, J
    Ouzzani, M
    Elmagarmid, AK
    BIOINFORMATICS, 2005, 21 (21) : 4054 - 4059