A Matter of Time: Faster Percolator Analysis via Efficient SVM Learning for Large-Scale Proteomics

被引:10
|
作者
Halloran, John T. [1 ]
Rocke, David M. [2 ]
机构
[1] Univ Calif Davis, Dept Publ Hlth Sci, Davis, CA 95616 USA
[2] Univ Calif Davis, Div Biostat, Davis, CA 95616 USA
基金
美国国家卫生研究院;
关键词
tandem mass spectrometry; machine learning; support vector machine; percolator; TRON; SENSITIVE PEPTIDE IDENTIFICATION; FALSE DISCOVERY RATES; MS-GF PLUS; SHOTGUN PROTEOMICS; NEWTON METHOD; ACCURATE; DATABASE;
D O I
10.1021/acs.jproteome.7b00767
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Percolator is an important tool for greatly improving the results of a database search and subsequent downstream analysis. Using support vector machines (SVMs), Percolator recalibrates peptide spectrum matches based on the learned decision boundary between targets and decoys. To improve analysis time for large-scale data sets, we update Percolator's SVM learning engine through software and algorithmic optimizations rather than heuristic approaches that necessitate the careful study of their impact on learned parameters across different search settings and data sets. We show that by optimizing Percolator's original learning algorithm, l(2)-SVM-MFN, large-scale SVM learning requires nearly only a third of the original runtime. Furthermore, we show that by employing the widely used Trust Region Newton (TRON) algorithm instead of l(2)-SVM-MFN, large-scale Percolator SVM learning is reduced to nearly only a fifth of the original runtime. Importantly, these speedups only affect the speed at which Percolator converges to a global solution and do not alter recalibration performance. The upgraded versions of both l(2)-SVM-MFN and TRON are optimized within the Percolator codebase for multithreaded and single-thread use and are available under Apache license at bitbucket.org/jthalloran/percolator_upgrade.
引用
收藏
页码:1978 / 1982
页数:5
相关论文
共 50 条
  • [31] Efficient gene orthology inference via large-scale rearrangements
    Diego P. Rubert
    Marília D. V. Braga
    Algorithms for Molecular Biology, 18
  • [32] Efficient large-scale cell classification and analysis for MultiOmyx™assays: A deep learning approach
    Nagy, Mate L.
    Hanifi, Arezoo
    Tirupsur, Ahalya
    Wong, Geoffrey
    Fang, Jun
    Hoe, Nicholas
    Au, Qingyan
    Padmanabhan, Raghav K.
    CANCER RESEARCH, 2018, 78 (13)
  • [33] Kraken: Memory-Efficient Continual Learning for Large-Scale Real-Time Recommendations
    Xie, Minhui
    Ren, Kai
    Lu, Youyou
    Yang, Guangxu
    Xu, Qingxing
    Wu, Bihai
    Lin, Jiazhen
    Ao, Hongbo
    Xu, Wanhong
    Shu, Jiwu
    PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20), 2020,
  • [34] Memory-Efficient Learning for Large-Scale Computational Imaging
    Kellman, Michael
    Zhang, Kevin
    Markley, Eric
    Tamir, Jon
    Bostan, Emrah
    Lustig, Michael
    Waller, Laura
    IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, 2020, 6 : 1403 - 1414
  • [35] An efficient algorithm for large-scale quasi-supervised learning
    Karacali, Bilge
    PATTERN ANALYSIS AND APPLICATIONS, 2016, 19 (02) : 311 - 323
  • [36] EFFICIENT LEARNING METHODS FOR LARGE-SCALE OPTIMAL INVERSION DESIGN
    Chung, Julianne
    Chung, Matthias
    Gazzola, Silvia
    Pasha, Mirjeta
    NUMERICAL ALGEBRA CONTROL AND OPTIMIZATION, 2024, 14 (01): : 137 - 159
  • [37] An efficient algorithm for large-scale quasi-supervised learning
    Bilge Karaçalı
    Pattern Analysis and Applications, 2016, 19 : 311 - 323
  • [38] Efficient Distributed Learning for Large-Scale Expectile Regression With Sparsity
    Pan, Yingli
    Liu, Zhan
    IEEE ACCESS, 2021, 9 (09): : 64732 - 64746
  • [39] MMSVC: An Efficient Unsupervised Learning Approach for Large-Scale Datasets
    Gu, Hong
    Zhao, Guangzhou
    Zhang, Jianliang
    LIFE SYSTEM MODELING AND INTELLIGENT COMPUTING, 2010, 6330 : 1 - 9
  • [40] MMSVC: An efficient unsupervised learning approach for large-scale datasets
    Gu, Hong
    Zhao, Guangzhou
    Zhang, Jianliang
    NEUROCOMPUTING, 2012, 98 : 114 - 122