A Matter of Time: Faster Percolator Analysis via Efficient SVM Learning for Large-Scale Proteomics

被引:10
|
作者
Halloran, John T. [1 ]
Rocke, David M. [2 ]
机构
[1] Univ Calif Davis, Dept Publ Hlth Sci, Davis, CA 95616 USA
[2] Univ Calif Davis, Div Biostat, Davis, CA 95616 USA
基金
美国国家卫生研究院;
关键词
tandem mass spectrometry; machine learning; support vector machine; percolator; TRON; SENSITIVE PEPTIDE IDENTIFICATION; FALSE DISCOVERY RATES; MS-GF PLUS; SHOTGUN PROTEOMICS; NEWTON METHOD; ACCURATE; DATABASE;
D O I
10.1021/acs.jproteome.7b00767
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Percolator is an important tool for greatly improving the results of a database search and subsequent downstream analysis. Using support vector machines (SVMs), Percolator recalibrates peptide spectrum matches based on the learned decision boundary between targets and decoys. To improve analysis time for large-scale data sets, we update Percolator's SVM learning engine through software and algorithmic optimizations rather than heuristic approaches that necessitate the careful study of their impact on learned parameters across different search settings and data sets. We show that by optimizing Percolator's original learning algorithm, l(2)-SVM-MFN, large-scale SVM learning requires nearly only a third of the original runtime. Furthermore, we show that by employing the widely used Trust Region Newton (TRON) algorithm instead of l(2)-SVM-MFN, large-scale Percolator SVM learning is reduced to nearly only a fifth of the original runtime. Importantly, these speedups only affect the speed at which Percolator converges to a global solution and do not alter recalibration performance. The upgraded versions of both l(2)-SVM-MFN and TRON are optimized within the Percolator codebase for multithreaded and single-thread use and are available under Apache license at bitbucket.org/jthalloran/percolator_upgrade.
引用
收藏
页码:1978 / 1982
页数:5
相关论文
共 50 条
  • [1] Linear Regression-Based Efficient SVM Learning for Large-Scale Classification
    Wu, Jianxin
    Yang, Hao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2015, 26 (10) : 2357 - 2369
  • [2] Fast and Accurate Protein False Discovery Rates on Large-Scale Proteomics Data Sets with Percolator 3.0
    The, Matthew
    MacCoss, Michael J.
    Noble, William S.
    Kall, Lukas
    JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 2016, 27 (11) : 1719 - 1727
  • [3] A Comparison of Svm With Deep Learning Models for Large-Scale Intents Analysis
    Islamic, Toqeer Ali
    Jan, Salman
    Faizullah, Safiullah
    Musa, Shahrulniza
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2018, 18 (07): : 38 - 46
  • [4] A Calibration Routine for Efficient ETD in Large-Scale Proteomics
    Rose, Christopher M.
    Rush, Matthew J. P.
    Riley, Nicholas M.
    Merrill, Anna E.
    Kwiecien, Nicholas W.
    Holden, Dustin D.
    Mullen, Christopher
    Westphall, Michael S.
    Coon, Joshua J.
    JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 2015, 26 (11) : 1848 - 1857
  • [5] Large-scale Proteomics Analysis of the Human Kinome
    Oppermann, Felix S.
    Gnad, Florian
    Olsen, Jesper V.
    Hornberger, Renate
    Greff, Zoltan
    Keri, Gyoergy
    Mann, Matthias
    Daub, Henrik
    MOLECULAR & CELLULAR PROTEOMICS, 2009, 8 (07) : 1751 - 1764
  • [6] Efficient Large-Scale Structured Learning
    Branson, Steve
    Beijbom, Oscar
    Belongie, Serge
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 1806 - 1813
  • [7] Efficient Processing of Models for Large-scale Shotgun Proteomics Data
    Grover, Himanshu
    Gopalakrishnan, Vanathi
    PROCEEDINGS OF THE 2012 8TH INTERNATIONAL CONFERENCE ON COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING (COLLABORATECOM 2012), 2012, : 591 - 596
  • [8] Proteomics beyond large-scale protein expression analysis
    Boersema, Paul J.
    Kahraman, Abdullah
    Picotti, Paola
    CURRENT OPINION IN BIOTECHNOLOGY, 2015, 34 : 162 - 170
  • [9] Efficient Machine Learning On Large-Scale Graphs
    Erickson, Parker
    Lee, Victor E.
    Shi, Feng
    Tang, Jiliang
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4788 - 4789
  • [10] Sequential learning with LS-SVM for large-scale data sets
    Jung, Tobias
    Polani, Daniel
    ARTIFICIAL NEURAL NETWORKS - ICANN 2006, PT 2, 2006, 4132 : 381 - 390