A Matter of Time: Faster Percolator Analysis via Efficient SVM Learning for Large-Scale Proteomics

被引:10
|
作者
Halloran, John T. [1 ]
Rocke, David M. [2 ]
机构
[1] Univ Calif Davis, Dept Publ Hlth Sci, Davis, CA 95616 USA
[2] Univ Calif Davis, Div Biostat, Davis, CA 95616 USA
基金
美国国家卫生研究院;
关键词
tandem mass spectrometry; machine learning; support vector machine; percolator; TRON; SENSITIVE PEPTIDE IDENTIFICATION; FALSE DISCOVERY RATES; MS-GF PLUS; SHOTGUN PROTEOMICS; NEWTON METHOD; ACCURATE; DATABASE;
D O I
10.1021/acs.jproteome.7b00767
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Percolator is an important tool for greatly improving the results of a database search and subsequent downstream analysis. Using support vector machines (SVMs), Percolator recalibrates peptide spectrum matches based on the learned decision boundary between targets and decoys. To improve analysis time for large-scale data sets, we update Percolator's SVM learning engine through software and algorithmic optimizations rather than heuristic approaches that necessitate the careful study of their impact on learned parameters across different search settings and data sets. We show that by optimizing Percolator's original learning algorithm, l(2)-SVM-MFN, large-scale SVM learning requires nearly only a third of the original runtime. Furthermore, we show that by employing the widely used Trust Region Newton (TRON) algorithm instead of l(2)-SVM-MFN, large-scale Percolator SVM learning is reduced to nearly only a fifth of the original runtime. Importantly, these speedups only affect the speed at which Percolator converges to a global solution and do not alter recalibration performance. The upgraded versions of both l(2)-SVM-MFN and TRON are optimized within the Percolator codebase for multithreaded and single-thread use and are available under Apache license at bitbucket.org/jthalloran/percolator_upgrade.
引用
收藏
页码:1978 / 1982
页数:5
相关论文
共 50 条
  • [41] Efficient Learning in Large-Scale Combinatorial Semi-Bandits
    Wen, Zheng
    Kveton, Branislav
    Ashkan, Azin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 1113 - 1122
  • [42] Efficient and scalable reinforcement learning for large-scale network control
    Ma, Chengdong
    Li, Aming
    Du, Yali
    Dong, Hao
    Yang, Yaodong
    NATURE MACHINE INTELLIGENCE, 2024, 6 (09) : 1006 - 1020
  • [43] Efficient large-scale data analysis using mapreduce
    Kubo, R., 1600, Nippon Telegraph and Telephone Corp. (10):
  • [44] SRDA: An efficient algorithm for large-scale discriminant analysis
    Cai, Deng
    He, Xiaofei
    Han, Jiawei
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (01) : 1 - 12
  • [45] Efficient bioinformatics approaches for large-scale data analysis
    Hautaniemi, S.
    FEBS JOURNAL, 2011, 278 : 27 - 27
  • [46] Efficient Motif Discovery for Large-Scale Time Series in Healthcare
    Liu, Bo
    Li, Jianqiang
    Chen, Cheng
    Tan, Wei
    Chen, Qiang
    Zhou, MengChu
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2015, 11 (03) : 583 - 590
  • [47] Large-scale structural learning and predicting via hashing approximation
    Chen, Dandan
    Tian, Yingjie
    NEURAL COMPUTING & APPLICATIONS, 2019, 31 (07): : 2889 - 2903
  • [48] Scheduling Large-scale Distributed Training via Reinforcement Learning
    Peng, Zhanglin
    Ren, Jiamin
    Zhang, Ruimao
    Wu, Lingyun
    Wang, Xinjiang
    Luo, Ping
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 1797 - 1806
  • [49] Large-scale structural learning and predicting via hashing approximation
    Dandan Chen
    Yingjie Tian
    Neural Computing and Applications, 2019, 31 : 2889 - 2903
  • [50] An Efficient Model-Free Approach for Controlling Large-Scale Canals via Hierarchical Reinforcement Learning
    Ren, Tao
    Niu, Jianwei
    Liu, Xuefeng
    Wu, Jiyan
    Lei, Xiaohui
    Zhang, Zhao
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (06) : 4367 - 4378