A universal database reduction method based on the sequence tag strategy to facilitate large-scale database search in proteomics

被引：2

作者：

Wang, Kai-Fei ^{[1
,2
]}

Wu, Yu-Zhuo ^{[1
,2
]}

Chi, Hao ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, CAS, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Beijing, Peoples R China

来源：

INTERNATIONAL JOURNAL OF MASS SPECTROMETRY | 2023年 / 483卷

关键词：

MASS SPECTROMETRISTS; PEPTIDES; TANDEM; IDENTIFICATION; METAPROTEOMICS; PROTEINS; IDENTIFY; SPECTRA;

D O I：

10.1016/j.ijms.2022.116966

中图分类号：

O64 [物理化学（理论化学）、化学物理学]; O56 [分子物理学、原子物理学];

学科分类号：

070203 ; 070304 ; 081704 ; 1406 ;

摘要：

Mass spectrometry-based metaproteomic and proteogenomic studies tend to use large-scale databases that may contain too many irrelevant or artificially constructed proteins. Such an imprecise database presents challenges for both the quality of peptide identification and the time consumption. To address them, we developed a database reduction method for iterative database searching, DBReducer, which can precisely and effectively reduce the large-scale database and is allowed to interface with any down-stream database search engine. In addition, an entrapment strategy was introduced to evaluate the identification precision and recall of different search modes. Compared with the common one-step database search and the traditional iterative database search, the iterative search with DBReducer respectively improved the peptide identification recall from an average of 67.8% and 83.7%-93.5%, and respectively improved the peptide identification precision from an average of 91.1% and 89.6%-91.3%, and more importantly, using DBReducer respectively reduced the time consumption by an average of 57.7% and 68.2%. Our results indicate that DBReducer has the potential to be a widely used database reduction method prior to common proteomic analysis, especially for scenarios with large-scale databases.(c) 2022 Published by Elsevier B.V.

引用

页数：11

共 50 条

[31] Prediction of error associated with false-positive rate determination for peptide identification in large-scale proteomics experiments using a combined reverse and forward peptide sequence database strategy
Huttlin, Edward L.
Hegeman, Adrian D.
Harms, Amy C.
Sussman, Michael R.
JOURNAL OF PROTEOME RESEARCH, 2007, 6 (01) : 392 - 398
[32] Large-scale intact glycopeptide identification by Mascot database search (vol 8, 2117, 2018)
Bollineni, Ravi Chand
Koehler, Christian Jeffrey
Gislefoss, Randi Elin
Anonsen, Jan Haug
Thiede, Bernd
SCIENTIFIC REPORTS, 2018, 8
[33] Large-scale deployment of three intelligent web-based database tutors
Mitrovic, Antonija
J. Compt. Inf. Technol., 4 (275-281):
[34] A study of the construction of a large-scale water quality spatial database based on ArcSDE
Xu Shuna
Yang Lingbin
Zhang Xia
Wu Jin
FIRST INTERNATIONAL WORKSHOP ON DATABASE TECHNOLOGY AND APPLICATIONS, PROCEEDINGS, 2009, : 279 - +
[35] A Large-Scale Database and a CNN Model for Attention-Based Glaucoma Detection
Li, Liu
Xu, Mai
Liu, Hanruo
Li, Yang
Wang, Xiaofei
Jiang, Lai
Wang, Zulin
Fan, Xiang
Wang, Ningli
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2020, 39 (02) : 413 - 424
[36] Construction of Adverbial-Verb Collocation Database Based on Large-Scale Corpus
Xing, Dan
Xun, Endong
Wang, Chengwen
Rao, Gaoqi
Ma, Luyao
CHINESE LEXICAL SEMANTICS (CLSW 2019), 2020, 11831 : 585 - 595
[37] Mode of large-scale subject database's subdivision based on dependency relations
Liu, Wenyuan
Xu, Lina
Chen, Guoying
Wang, Baowen
Journal of Computational Information Systems, 2008, 4 (02): : 509 - 514
[38] Large-Scale Data Storage and Management Scheme Based on Distributed Database Systems
Sun, Qiao
Deng, Bu-qiao
Fu, Lan-mei
Wang, Zhi-qiang
Pei, Xu-bin
Sun, Jia-Song
PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND INTELLIGENT MANUFACTURING (ITIM 2017), 2017, 142 : 14 - 17
[39] Large-scale deployment of three intelligent Web-based database tutors
Mitrovic, Antonija
ITI 2006: PROCEEDINGS OF THE 28TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY INTERFACES, 2006, : 135 - 140
[40] A resource database for protein kinase substrate sequence-preference motifs based on large-scale mass spectrometry data
Brian G. Poll
Kirby T. Leo
Venky Deshpande
Nipun Jayatissa
Trairak Pisitkun
Euijung Park
Chin-Rang Yang
Viswanathan Raghuram
Mark A. Knepper
Cell Communication and Signaling, 22

← 1 2 3 4 5 →