A universal database reduction method based on the sequence tag strategy to facilitate large-scale database search in proteomics

被引:2
|
作者
Wang, Kai-Fei [1 ,2 ]
Wu, Yu-Zhuo [1 ,2 ]
Chi, Hao [1 ,2 ]
机构
[1] Chinese Acad Sci, CAS, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
关键词
MASS SPECTROMETRISTS; PEPTIDES; TANDEM; IDENTIFICATION; METAPROTEOMICS; PROTEINS; IDENTIFY; SPECTRA;
D O I
10.1016/j.ijms.2022.116966
中图分类号
O64 [物理化学(理论化学)、化学物理学]; O56 [分子物理学、原子物理学];
学科分类号
070203 ; 070304 ; 081704 ; 1406 ;
摘要
Mass spectrometry-based metaproteomic and proteogenomic studies tend to use large-scale databases that may contain too many irrelevant or artificially constructed proteins. Such an imprecise database presents challenges for both the quality of peptide identification and the time consumption. To address them, we developed a database reduction method for iterative database searching, DBReducer, which can precisely and effectively reduce the large-scale database and is allowed to interface with any down-stream database search engine. In addition, an entrapment strategy was introduced to evaluate the identification precision and recall of different search modes. Compared with the common one-step database search and the traditional iterative database search, the iterative search with DBReducer respectively improved the peptide identification recall from an average of 67.8% and 83.7%-93.5%, and respectively improved the peptide identification precision from an average of 91.1% and 89.6%-91.3%, and more importantly, using DBReducer respectively reduced the time consumption by an average of 57.7% and 68.2%. Our results indicate that DBReducer has the potential to be a widely used database reduction method prior to common proteomic analysis, especially for scenarios with large-scale databases.(c) 2022 Published by Elsevier B.V.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Prediction of error associated with false-positive rate determination for peptide identification in large-scale proteomics experiments using a combined reverse and forward peptide sequence database strategy
    Huttlin, Edward L.
    Hegeman, Adrian D.
    Harms, Amy C.
    Sussman, Michael R.
    JOURNAL OF PROTEOME RESEARCH, 2007, 6 (01) : 392 - 398
  • [32] Large-scale intact glycopeptide identification by Mascot database search (vol 8, 2117, 2018)
    Bollineni, Ravi Chand
    Koehler, Christian Jeffrey
    Gislefoss, Randi Elin
    Anonsen, Jan Haug
    Thiede, Bernd
    SCIENTIFIC REPORTS, 2018, 8
  • [33] Large-scale deployment of three intelligent web-based database tutors
    Mitrovic, Antonija
    J. Compt. Inf. Technol., 4 (275-281):
  • [34] A study of the construction of a large-scale water quality spatial database based on ArcSDE
    Xu Shuna
    Yang Lingbin
    Zhang Xia
    Wu Jin
    FIRST INTERNATIONAL WORKSHOP ON DATABASE TECHNOLOGY AND APPLICATIONS, PROCEEDINGS, 2009, : 279 - +
  • [35] A Large-Scale Database and a CNN Model for Attention-Based Glaucoma Detection
    Li, Liu
    Xu, Mai
    Liu, Hanruo
    Li, Yang
    Wang, Xiaofei
    Jiang, Lai
    Wang, Zulin
    Fan, Xiang
    Wang, Ningli
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2020, 39 (02) : 413 - 424
  • [36] Construction of Adverbial-Verb Collocation Database Based on Large-Scale Corpus
    Xing, Dan
    Xun, Endong
    Wang, Chengwen
    Rao, Gaoqi
    Ma, Luyao
    CHINESE LEXICAL SEMANTICS (CLSW 2019), 2020, 11831 : 585 - 595
  • [37] Mode of large-scale subject database's subdivision based on dependency relations
    Liu, Wenyuan
    Xu, Lina
    Chen, Guoying
    Wang, Baowen
    Journal of Computational Information Systems, 2008, 4 (02): : 509 - 514
  • [38] Large-Scale Data Storage and Management Scheme Based on Distributed Database Systems
    Sun, Qiao
    Deng, Bu-qiao
    Fu, Lan-mei
    Wang, Zhi-qiang
    Pei, Xu-bin
    Sun, Jia-Song
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND INTELLIGENT MANUFACTURING (ITIM 2017), 2017, 142 : 14 - 17
  • [39] Large-scale deployment of three intelligent Web-based database tutors
    Mitrovic, Antonija
    ITI 2006: PROCEEDINGS OF THE 28TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY INTERFACES, 2006, : 135 - 140
  • [40] A resource database for protein kinase substrate sequence-preference motifs based on large-scale mass spectrometry data
    Brian G. Poll
    Kirby T. Leo
    Venky Deshpande
    Nipun Jayatissa
    Trairak Pisitkun
    Euijung Park
    Chin-Rang Yang
    Viswanathan Raghuram
    Mark A. Knepper
    Cell Communication and Signaling, 22