Fast and Accurate Protein False Discovery Rates on Large-Scale Proteomics Data Sets with Percolator 3.0

被引:259
|
作者
The, Matthew [1 ]
MacCoss, Michael J. [2 ]
Noble, William S. [2 ,3 ]
Kall, Lukas [1 ]
机构
[1] KTH Royal Inst Technol, Sci Life Lab, Sch Biotechnol, Box 1031, S-17121 Solna, Sweden
[2] Univ Washington, Sch Med, Dept Genome Sci, Seattle, WA 98195 USA
[3] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
基金
美国国家卫生研究院;
关键词
Mass spectrometry - LC-MS/MS; Statistical analysis; Data processing and analysis; Protein inference; Large scale studies; TANDEM MASS-SPECTROMETRY; SHOTGUN PROTEOMICS; PEPTIDE IDENTIFICATION; SPECTRA; PROBABILITIES; DATABASES; INFERENCE; STRIKE;
D O I
10.1007/s13361-016-1460-7
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Percolator is a widely used software tool that increases yield in shotgun proteomics experiments and assigns reliable statistical confidence measures, such as q values and posterior error probabilities, to peptides and peptide-spectrum matches ( PSMs) from such experiments. Percolator's processing speed has been sufficient for typical data sets consisting of hundreds of thousands of PSMs. With our new scalable approach, we can now also analyze millions of PSMs in a matter of minutes on a commodity computer. Furthermore, with the increasing awareness for the need for reliable statistics on the protein level, we compared several easy-to-understand protein inference methods and implemented the best-performing method-grouping proteins by their corresponding sets of theoretical peptides and then considering only the best-scoring peptide for each protein-in the Percolator package. We used Percolator 3.0 to analyze the data from a recent study of the draft human proteome containing 25 million spectra (PM:24870542). The source code and Ubuntu, Windows, MacOS, and Fedora binary packages are available from http://percolator.ms/under an Apache 2.0 license.
引用
收藏
页码:1719 / 1727
页数:9
相关论文
共 50 条
  • [1] Protein Identification False Discovery Rates for Very Large Proteomics Data Sets Generated by Tandem Mass Spectrometry
    Reiter, Lukas
    Claassen, Manfred
    Schrimpf, Sabine P.
    Jovanovic, Marko
    Schmidt, Alexander
    Buhmann, Joachim M.
    Hengartner, Michael O.
    Aebersold, Ruedi
    MOLECULAR & CELLULAR PROTEOMICS, 2009, 8 (11) : 2405 - 2417
  • [2] A fast hierarchical clustering algorithm for large-scale protein sequence data sets
    Szilagyi, Sandor M.
    Szilagyi, Laszlo
    COMPUTERS IN BIOLOGY AND MEDICINE, 2014, 48 : 94 - 101
  • [3] Optimal Control of Directional False Discovery Rates in Large-Scale Testing
    Tang, Guozhu
    Kang, Yicheng
    Xiang, Dongdong
    STATISTICS IN MEDICINE, 2025, 44 (05)
  • [4] False discovery rates for large-scale model checking under certain dependence
    Deng, Lu
    Zi, Xuemin
    Li, Zhonghua
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2018, 47 (01) : 64 - 79
  • [5] False Discovery Rate Control for Fast Screening of Large-Scale Genomics Biobanks
    Machkour, Jasin
    Muma, Michael
    Palomar, Daniel P.
    2023 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP, SSP, 2023, : 666 - 670
  • [6] DISVMs: Fast SVMs Training on Large-scale Data Sets
    Cui, Lijuan
    Wang, Changjian
    Li, Ziyang
    Peng, Yuxing
    2016 IEEE 28TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2016), 2016, : 967 - 971
  • [7] Application of de Novo Sequencing to Large-Scale Complex Proteomics Data Sets
    Devabhaktuni, Arun
    Elias, Joshua E.
    JOURNAL OF PROTEOME RESEARCH, 2016, 15 (03) : 732 - 742
  • [8] Comparative assessment of large-scale data sets of protein–protein interactions
    Christian von Mering
    Roland Krause
    Berend Snel
    Michael Cornell
    Stephen G. Oliver
    Stanley Fields
    Peer Bork
    Nature, 2002, 417 : 399 - 403
  • [9] A Matter of Time: Faster Percolator Analysis via Efficient SVM Learning for Large-Scale Proteomics
    Halloran, John T.
    Rocke, David M.
    JOURNAL OF PROTEOME RESEARCH, 2018, 17 (05) : 1978 - 1982
  • [10] Fast and fully-automated histograms for large-scale data sets
    Mendizabal, Valentina Zelaya
    Boulle, Marc
    Rossi, Fabrice
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2023, 180