A Matter of Time: Faster Percolator Analysis via Efficient SVM Learning for Large-Scale Proteomics

被引:10
|
作者
Halloran, John T. [1 ]
Rocke, David M. [2 ]
机构
[1] Univ Calif Davis, Dept Publ Hlth Sci, Davis, CA 95616 USA
[2] Univ Calif Davis, Div Biostat, Davis, CA 95616 USA
基金
美国国家卫生研究院;
关键词
tandem mass spectrometry; machine learning; support vector machine; percolator; TRON; SENSITIVE PEPTIDE IDENTIFICATION; FALSE DISCOVERY RATES; MS-GF PLUS; SHOTGUN PROTEOMICS; NEWTON METHOD; ACCURATE; DATABASE;
D O I
10.1021/acs.jproteome.7b00767
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Percolator is an important tool for greatly improving the results of a database search and subsequent downstream analysis. Using support vector machines (SVMs), Percolator recalibrates peptide spectrum matches based on the learned decision boundary between targets and decoys. To improve analysis time for large-scale data sets, we update Percolator's SVM learning engine through software and algorithmic optimizations rather than heuristic approaches that necessitate the careful study of their impact on learned parameters across different search settings and data sets. We show that by optimizing Percolator's original learning algorithm, l(2)-SVM-MFN, large-scale SVM learning requires nearly only a third of the original runtime. Furthermore, we show that by employing the widely used Trust Region Newton (TRON) algorithm instead of l(2)-SVM-MFN, large-scale Percolator SVM learning is reduced to nearly only a fifth of the original runtime. Importantly, these speedups only affect the speed at which Percolator converges to a global solution and do not alter recalibration performance. The upgraded versions of both l(2)-SVM-MFN and TRON are optimized within the Percolator codebase for multithreaded and single-thread use and are available under Apache license at bitbucket.org/jthalloran/percolator_upgrade.
引用
收藏
页码:1978 / 1982
页数:5
相关论文
共 50 条
  • [21] Large-Scale Video Hashing via Structure Learning
    Ye, Guangnan
    Liu, Dong
    Wang, Jun
    Chang, Shih-Fu
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 2272 - 2279
  • [22] SVM ensemble based transfer learning for large-scale membrane proteins discrimination
    Mei, Suyu
    JOURNAL OF THEORETICAL BIOLOGY, 2014, 340 : 105 - 110
  • [23] ACTIVE LEARNING FOR LARGE-SCALE FACTOR ANALYSIS
    Silva, Jorge
    Carin, Lawrence
    2012 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2012, : 161 - 164
  • [24] Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning
    Lin, Kaixiang
    Zhao, Renyu
    Xu, Zhe
    Zhou, Jiayu
    KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 1774 - 1783
  • [25] Analysis of Large-Scale SVM Training Algorithms for Language and Speaker Recognition
    Cumani, Sandro
    Laface, Pietro
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (05): : 1585 - 1596
  • [26] Influence of various endogenous and artefact modifications on large-scale proteomics analysis
    Bienvenut, Willy V.
    Sumpton, David
    Lilla, Sergio
    Martinez, Aude
    Meinnel, Thierry
    Giglione, Carmela
    RAPID COMMUNICATIONS IN MASS SPECTROMETRY, 2013, 27 (03) : 443 - 450
  • [27] Efficient Large-Scale Video Retrieval via Discriminative Signatures
    Hao, Pengyi
    Kamata, Sei-ichiro
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (08) : 1800 - 1810
  • [28] Efficient large-scale configuration via integer linear programming
    Feinerer, Ingo
    AI EDAM-ARTIFICIAL INTELLIGENCE FOR ENGINEERING DESIGN ANALYSIS AND MANUFACTURING, 2013, 27 (01): : 37 - 49
  • [29] An Analysis Framework for Large-Scale Time Series
    Teng F.
    Huang Q.-C.
    Li T.-R.
    Wang C.
    Tian C.-H.
    Jisuanji Xuebao/Chinese Journal of Computers, 2020, 43 (07): : 1279 - 1292
  • [30] Efficient gene orthology inference via large-scale rearrangements
    Rubert, Diego P.
    Braga, Marilia D. V.
    ALGORITHMS FOR MOLECULAR BIOLOGY, 2023, 18 (01)