Fast and Accurate Protein False Discovery Rates on Large-Scale Proteomics Data Sets with Percolator 3.0

被引:259
|
作者
The, Matthew [1 ]
MacCoss, Michael J. [2 ]
Noble, William S. [2 ,3 ]
Kall, Lukas [1 ]
机构
[1] KTH Royal Inst Technol, Sci Life Lab, Sch Biotechnol, Box 1031, S-17121 Solna, Sweden
[2] Univ Washington, Sch Med, Dept Genome Sci, Seattle, WA 98195 USA
[3] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
基金
美国国家卫生研究院;
关键词
Mass spectrometry - LC-MS/MS; Statistical analysis; Data processing and analysis; Protein inference; Large scale studies; TANDEM MASS-SPECTROMETRY; SHOTGUN PROTEOMICS; PEPTIDE IDENTIFICATION; SPECTRA; PROBABILITIES; DATABASES; INFERENCE; STRIKE;
D O I
10.1007/s13361-016-1460-7
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Percolator is a widely used software tool that increases yield in shotgun proteomics experiments and assigns reliable statistical confidence measures, such as q values and posterior error probabilities, to peptides and peptide-spectrum matches ( PSMs) from such experiments. Percolator's processing speed has been sufficient for typical data sets consisting of hundreds of thousands of PSMs. With our new scalable approach, we can now also analyze millions of PSMs in a matter of minutes on a commodity computer. Furthermore, with the increasing awareness for the need for reliable statistics on the protein level, we compared several easy-to-understand protein inference methods and implemented the best-performing method-grouping proteins by their corresponding sets of theoretical peptides and then considering only the best-scoring peptide for each protein-in the Percolator package. We used Percolator 3.0 to analyze the data from a recent study of the draft human proteome containing 25 million spectra (PM:24870542). The source code and Ubuntu, Windows, MacOS, and Fedora binary packages are available from http://percolator.ms/under an Apache 2.0 license.
引用
收藏
页码:1719 / 1727
页数:9
相关论文
共 50 条
  • [41] Large-scale comparative visualisation of sets of multidimensional data
    Vohl, Dany
    Barnes, David G.
    Fluke, Christopher J.
    Poudel, Govinda
    Georgiou-Karistianis, Nellie
    Hassan, Amr H.
    Benovitski, Yuri
    Wong, Tsz Ho
    Kaluza, Owen L.
    Nguyen, Toan D.
    Bonnington, C. Paul
    PEERJ COMPUTER SCIENCE, 2016,
  • [42] Functional proteomics: large-scale analysis of protein kinase activity
    Lawrence, David S.
    GENOME BIOLOGY, 2001, 2 (02):
  • [43] Functional proteomics: large-scale analysis of protein kinase activity
    David S Lawrence
    Genome Biology, 2 (2)
  • [44] Detecting differential protein expression in large-scale population proteomics
    Ryu, So Young
    Qian, Wei-Jun
    Camp, David G.
    Smith, Richard D.
    Tompkins, Ronald G.
    Davis, Ronald W.
    Xiao, Wenzhong
    BIOINFORMATICS, 2014, 30 (19) : 2741 - 2746
  • [45] Large-scale Optimization of Partial AUC in a Range of False Positive Rates
    Yao, Yao
    Lin, Qihang
    Yang, Tianbao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [46] An Efficient Parallel Approach for Identifying Protein Families in Large-scale Metagenomic Data Sets
    Wu, Changjun
    Kalyanaraman, Ananth
    INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2008, : 362 - 371
  • [47] Improving large-scale proteomics by clustering of mass spectrometry data
    Beer, I
    Barnea, E
    Ziv, T
    Admon, A
    PROTEOMICS, 2004, 4 (04) : 950 - 960
  • [48] Efficient Processing of Models for Large-scale Shotgun Proteomics Data
    Grover, Himanshu
    Gopalakrishnan, Vanathi
    PROCEEDINGS OF THE 2012 8TH INTERNATIONAL CONFERENCE ON COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING (COLLABORATECOM 2012), 2012, : 591 - 596
  • [49] fast_protein_cluster: parallel and optimized clustering of large-scale protein modeling data
    Hung, Ling-Hong
    Samudrala, Ram
    BIOINFORMATICS, 2014, 30 (12) : 1774 - 1776
  • [50] Improvements to the Rice Genome Annotation Through Large-Scale Analysis of RNA-Seq and Proteomics Data Sets
    Ren, Zhe
    Qi, Da
    Pugh, Nina
    Li, Kai
    Wen, Bo
    Zhou, Ruo
    Xu, Shaohang
    Liu, Siqi
    Jones, Andrew R.
    MOLECULAR & CELLULAR PROTEOMICS, 2019, 18 (01) : 86 - 98