A universal database reduction method based on the sequence tag strategy to facilitate large-scale database search in proteomics

被引:2
|
作者
Wang, Kai-Fei [1 ,2 ]
Wu, Yu-Zhuo [1 ,2 ]
Chi, Hao [1 ,2 ]
机构
[1] Chinese Acad Sci, CAS, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
关键词
MASS SPECTROMETRISTS; PEPTIDES; TANDEM; IDENTIFICATION; METAPROTEOMICS; PROTEINS; IDENTIFY; SPECTRA;
D O I
10.1016/j.ijms.2022.116966
中图分类号
O64 [物理化学(理论化学)、化学物理学]; O56 [分子物理学、原子物理学];
学科分类号
070203 ; 070304 ; 081704 ; 1406 ;
摘要
Mass spectrometry-based metaproteomic and proteogenomic studies tend to use large-scale databases that may contain too many irrelevant or artificially constructed proteins. Such an imprecise database presents challenges for both the quality of peptide identification and the time consumption. To address them, we developed a database reduction method for iterative database searching, DBReducer, which can precisely and effectively reduce the large-scale database and is allowed to interface with any down-stream database search engine. In addition, an entrapment strategy was introduced to evaluate the identification precision and recall of different search modes. Compared with the common one-step database search and the traditional iterative database search, the iterative search with DBReducer respectively improved the peptide identification recall from an average of 67.8% and 83.7%-93.5%, and respectively improved the peptide identification precision from an average of 91.1% and 89.6%-91.3%, and more importantly, using DBReducer respectively reduced the time consumption by an average of 57.7% and 68.2%. Our results indicate that DBReducer has the potential to be a widely used database reduction method prior to common proteomic analysis, especially for scenarios with large-scale databases.(c) 2022 Published by Elsevier B.V.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] A large-scale assessment of sequence database search tools for homology-based protein function prediction
    Zhang, Chengxin
    Freddolino, Lydia
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (04)
  • [2] SWhybrid: A Hybrid-Parallel Framework for Large-Scale Protein Sequence Database Search
    Lan, Haidong
    Liu, Weiguo
    Liu, Yongchao
    Schmidt, Bertil
    2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, : 42 - 51
  • [3] Statistical validation of peptide identifications in large-scale proteomics using the target-decoy database search strategy and flexible mixture modeling
    Choi, Hyungwon
    Ghosh, Debashis
    Nesvizhskii, Alexey I.
    JOURNAL OF PROTEOME RESEARCH, 2008, 7 (01) : 286 - 292
  • [4] Large-scale intact glycopeptide identification by Mascot database search
    Bollineni, Ravi Chand
    Koehler, Christian Jeffrey
    Gislefoss, Randi Elin
    Anonsen, Jan Haug
    Thiede, Bernd
    SCIENTIFIC REPORTS, 2018, 8
  • [5] Large-scale intact glycopeptide identification by Mascot database search
    Ravi Chand Bollineni
    Christian Jeffrey Koehler
    Randi Elin Gislefoss
    Jan Haug Anonsen
    Bernd Thiede
    Scientific Reports, 8
  • [6] A Sequence-to-Sequence Model for Large-scale Chinese Abbreviation Database Construction
    Wang, Chao
    Liu, Jingping
    Zhuang, Tianyi
    Li, Jiahang
    Liu, Juntao
    Xiao, Yanghua
    Wang, Wei
    Xie, Rui
    WSDM'22: PROCEEDINGS OF THE FIFTEENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2022, : 1063 - 1071
  • [7] Motion retrieval based on an efficient index method for large-scale Mocap database
    Xiang, Jian
    Zhu, Hongli
    DIGITAL HUMAN MODELING, 2007, 4561 : 234 - 242
  • [8] Fast search in large-scale image database using vector quantization
    Ye, HJ
    Xu, GY
    IMAGE AND VIDEO RETRIEVAL, PROCEEDINGS, 2003, 2728 : 477 - 487
  • [9] Hierarchical indexing scheme for fast search in large-scale image database
    Ye, HJ
    Xu, GY
    THIRD INTERNATIONAL SYMPOSIUM ON MULTISPECTRAL IMAGE PROCESSING AND PATTERN RECOGNITION, PTS 1 AND 2, 2003, 5286 : 974 - +
  • [10] A distributed attribute reduction algorithm based on covering rough set for large-scale database
    College of Information Science and Engineering, Hunan University, Changsha, China
    不详
    J. Comput. Inf. Syst., 15 (5433-5442):