An automated proteomic data analysis workflow for mass spectrometry

被引:13
|
作者
Pendarvis, Ken [1 ]
Kumar, Ranjit [1 ,2 ]
Burgess, Shane C. [1 ,2 ,3 ,4 ]
Nanduri, Bindu [1 ,2 ]
机构
[1] Mississippi State Univ, Inst Digital Biol, Mississippi State, MS 39762 USA
[2] Mississippi State Univ, Coll Vet Med, Mississippi State, MS 39762 USA
[3] Mississippi State Univ, Mississippi Agr & Forestry Expt Stn, Mississippi State, MS 39762 USA
[4] Mississippi State Univ, Life Sci & Biotechnol Inst, Mississippi State, MS 39762 USA
来源
BMC BIOINFORMATICS | 2009年 / 10卷
基金
美国国家科学基金会;
关键词
SUBMINIMUM INHIBITORY CONCENTRATIONS; PROFESSIONAL ANTIGEN PRESENTATION; PROTEIN IDENTIFICATION; SHOTGUN PROTEOMICS; EXPRESSION; SPECTRA; TANDEM; MODEL; DATABASES;
D O I
10.1186/1471-2105-10-S11-S17
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Mass spectrometry-based protein identification methods are fundamental to proteomics. Biological experiments are usually performed in replicates and proteomic analyses generate huge datasets which need to be integrated and quantitatively analyzed. The Sequest T search algorithm is a commonly used algorithm for identifying peptides and proteins from two dimensional liquid chromatography electrospray ionization tandem mass spectrometry (2-D LC ESI MS2) data. A number of proteomic pipelines that facilitate high throughput 'post data acquisition analysis' are described in the literature. However, these pipelines need to be updated to accommodate the rapidly evolving data analysis methods. Here, we describe a proteomic data analysis pipeline that specifically addresses two main issues pertinent to protein identification and differential expression analysis: 1) estimation of the probability of peptide and protein identifications and 2) non-parametric statistics for protein differential expression analysis. Our proteomic analysis workflow analyzes replicate datasets from a single experimental paradigm to generate a list of identified proteins with their probabilities and significant changes in protein expression using parametric and non-parametric statistics. Results: The input for our workflow is Bioworks (TM) 3.2 Sequest (or a later version, including cluster) output in XML format. We use a decoy database approach to assign probability to peptide identifications. The user has the option to select "quality thresholds" on peptide identifications based on the P value. We also estimate probability for protein identification. Proteins identified with peptides at a user-specified threshold value from biological experiments are grouped as either control or treatment for further analysis in ProtQuant. ProtQuant utilizes a parametric (ANOVA) method, for calculating differences in protein expression based on the quantitative measure Sigma Xcorr. Alternatively ProtQuant output can be further processed using non-parametric Monte-Carlo resampling statistics to calculate P values for differential expression. Correction for multiple testing of ANOVA and resampling P values is done using Benjamini and Hochberg's method. The results of these statistical analyses are then combined into a single output file containing a comprehensive protein list with probabilities and differential expression analysis, associated P values, and resampling statistics. Conclusion: For biologists carrying out proteomics by mass spectrometry, our workflow facilitates automated, easy to use analyses of Bioworks (3.2 or later versions) data. All the methods used in the workflow are peer-reviewed and as such the results of our workflow are compliant with proteomic data submission guidelines to public proteomic data repositories including PRIDE. Our workflow is a necessary intermediate step that is required to link proteomics data to biological knowledge for generating testable hypotheses.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] An automated proteomic data analysis workflow for mass spectrometry
    Ken Pendarvis
    Ranjit Kumar
    Shane C Burgess
    Bindu Nanduri
    [J]. BMC Bioinformatics, 10
  • [2] Analysis and validation of proteomic data generated by tandem mass spectrometry
    Nesvizhskii, Alexey I.
    Vitek, Olga
    Aebersold, Ruedi
    [J]. NATURE METHODS, 2007, 4 (10) : 787 - 797
  • [3] Analysis and validation of proteomic data generated by tandem mass spectrometry
    Alexey I Nesvizhskii
    Olga Vitek
    Ruedi Aebersold
    [J]. Nature Methods, 2007, 4 : 787 - 797
  • [4] Data mining in proteomic mass spectrometry
    Thomas A.
    Tourassi G.D.
    Elmaghraby A.S.
    Valdes Jr. R.
    Jortani S.A.
    [J]. Clinical Proteomics, 2006, 2 (1-2) : 13 - 32
  • [5] An Automated Workflow Composition System for Liquid Chromatography-Mass Spectrometry Metabolomics Data Processing
    Du, Xinsong
    Dastmalchi, Farhad
    Diller, Matthew A.
    Brochhausen, Mathias
    Garrett, Timothy J.
    Hogan, William R.
    Lemas, Dominick J.
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 2023, 34 (12) : 2857 - 2863
  • [6] Automated workflow composition in mass spectrometry-based proteomics
    Palmblad, Magnus
    Lamprecht, Anna-Lena
    Ison, Jon
    Schwammle, Veit
    [J]. BIOINFORMATICS, 2019, 35 (04) : 656 - 664
  • [7] Mass spectrometry in plant proteomic analysis
    Colas, I.
    Koroleva, O.
    Shaw, P. J.
    [J]. PLANT BIOSYSTEMS, 2010, 144 (03): : 703 - 714
  • [8] Algorithms for alignment of mass spectrometry proteomic data
    Jeffries, N
    [J]. BIOINFORMATICS, 2005, 21 (14) : 3066 - 3073
  • [9] Proteomic cancer classification with mass spectrometry data
    Rajapakse, JC
    Duan, KB
    Yeo, WK
    [J]. AMERICAN JOURNAL OF PHARMACOGENOMICS, 2005, 5 (05) : 281 - 292
  • [10] Automated selected reaction monitoring data analysis workflow for large-scale targeted proteomic studies
    Silvia Surinova
    Ruth Hüttenhain
    Ching-Yun Chang
    Lucia Espona
    Olga Vitek
    Ruedi Aebersold
    [J]. Nature Protocols, 2013, 8 : 1602 - 1619