Fast Open Modification Spectral Library Searching through Approximate Nearest Neighbor Indexing

被引:49
|
作者
Bittremieux, Wout [1 ,2 ,3 ]
Meysman, Pieter [1 ,2 ]
Noble, William Stafford [3 ,4 ]
Laukens, Kris [1 ,2 ]
机构
[1] Univ Antwerp, Dept Math & Comp Sci, B-2020 Antwerp, Belgium
[2] Biomed Informat Network Antwerpen Biomina, B-2020 Antwerp, Belgium
[3] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
[4] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
基金
比利时弗兰德研究基金会; 美国国家卫生研究院;
关键词
mass spectrometry; proteomics; open modification searching; spectral library; post-translational modifications; approximate nearest neighbors; TANDEM MASS-SPECTRA; PEPTIDE IDENTIFICATION; POSTTRANSLATIONAL MODIFICATIONS; SHOTGUN PROTEOMICS; PROTEIN MODIFICATIONS; DATABASE SEARCH; MS/MS SPECTRA; SPECTROMETRY; DISCOVERY; QUANTIFICATION;
D O I
10.1021/acs.jproteome.8b00359
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Open modification searching (OMS) is a powerful search strategy that identifies peptides carrying any type of modification by allowing a modified spectrum to match against its unmodified variant by using a very wide precursor mass window. A drawback of this strategy, however, is that it leads to a large increase in search time. Although performing an open search can be done using existing spectral library search engines by simply setting a wide precursor mass window, none of these tools have been optimized for OMS, leading to excessive runtimes and suboptimal identification results. We present the ANN-SoLo tool for fast and accurate open spectral library searching. ANN-SoLo uses approximate nearest neighbor indexing to speed up OMS by selecting only a limited number of the most relevant library spectra to compare to an unknown query spectrum. This approach is combined with a cascade search strategy to maximize the number of identified unmodified and modified spectra while strictly controlling the false discovery rate as well as a shifted dot product score to sensitively match modified spectra to their unmodified counterparts. ANN-SoLo achieves state-of-the-art performance in terms of speed and the number of identifications. On a previously published human cell line data set, ANN-SoLo confidently identifies more spectra than SpectraST or MSFragger and achieves a speedup of an order of magnitude compared with SpectraST. ANN-SoLo is implemented in Python and C++. It is freely available under the Apache 2.0 license at https://github.com/bittremieux/ANN-SoLo.
引用
收藏
页码:3463 / 3474
页数:12
相关论文
共 50 条
  • [1] Fast spectral analysis for approximate nearest neighbor search
    Jing Wang
    Jie Shen
    [J]. Machine Learning, 2022, 111 : 2297 - 2322
  • [2] Fast spectral analysis for approximate nearest neighbor search
    Wang, Jing
    Shen, Jie
    [J]. MACHINE LEARNING, 2022, 111 (06) : 2297 - 2322
  • [3] A review of feature indexing methods for fast approximate nearest neighbor search
    The-Anh Pham
    Van-Hao Le
    Dinh-Nghiep Le
    [J]. PROCEEDINGS OF 2018 5TH NAFOSTED CONFERENCE ON INFORMATION AND COMPUTER SCIENCE (NICS 2018), 2018, : 372 - 377
  • [4] Approximate nearest neighbor searching in multimedia databases
    Ferhatosmanoglu, H
    Tuncel, E
    Agrawal, D
    El Abbadi, A
    [J]. 17TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2001, : 503 - 511
  • [5] SOAR: Improved Indexing for Approximate Nearest Neighbor Search
    Sun, Philip
    Simcha, David
    Dopson, Dave
    Guo, Ruiqi
    Kumar, Sanjiv
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [6] OCR enhancement through neighbor embedding and fast approximate nearest neighbors
    Smith, D. C.
    [J]. APPLICATIONS OF DIGITAL IMAGE PROCESSING XXXV, 2012, 8499
  • [7] New Directions in Approximate Nearest-Neighbor Searching
    Mount, David M.
    [J]. ALGORITHMS AND DISCRETE APPLIED MATHEMATICS, CALDAM 2019, 2019, 11394 : 1 - 15
  • [8] A strong lower bound for approximate nearest neighbor searching
    Liu, D
    [J]. INFORMATION PROCESSING LETTERS, 2004, 92 (01) : 23 - 29
  • [9] Fully Retroactive Approximate Range and Nearest Neighbor Searching
    Goodrich, Michael T.
    Simons, Joseph A.
    [J]. ALGORITHMS AND COMPUTATION, 2011, 7074 : 292 - 301
  • [10] Feature Matching Method Based on SURF and Fast Library for Approximate Nearest Neighbor Search
    Wang Beiyi
    Zhang Xiaohong
    Wang Weibing
    [J]. INTEGRATED FERROELECTRICS, 2021, 218 (01) : 147 - 154