A large-scale assessment of sequence database search tools for homology-based protein function prediction

被引：4

作者：

Zhang, Chengxin ^{[1
]}

Freddolino, Lydia ^{[1
]}

机构：

[1] Univ Michigan, Dept Computat Med & Bioinformat, Dept Biol Chem, 100 Washtenaw Ave, Ann Arbor, MI 48109 USA

来源：

BRIEFINGS IN BIOINFORMATICS | 2024年 / 25卷 / 04期

关键词：

Gene Ontology; protein function prediction; sequence database search; BLASTp; DIAMOND; MMseqs2; ANNOTATION; GENERATION; ZEBRAFISH;

D O I：

10.1093/bib/bbae349

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Sequence database searches followed by homology-based function transfer form one of the oldest and most popular approaches for predicting protein functions, such as Gene Ontology (GO) terms. These searches are also a critical component in most state-of-the-art machine learning and deep learning-based protein function predictors. Although sequence search tools are the basis of homology-based protein function prediction, previous studies have scarcely explored how to select the optimal sequence search tools and configure their parameters to achieve the best function prediction. In this paper, we evaluate the effect of using different options from among popular search tools, as well as the impacts of search parameters, on protein function prediction. When predicting GO terms on a large benchmark dataset, we found that BLASTp and MMseqs2 consistently exceed the performance of other tools, including DIAMOND-one of the most popular tools for function prediction-under default search parameters. However, with the correct parameter settings, DIAMOND can perform comparably to BLASTp and MMseqs2 in function prediction. Additionally, we developed a new scoring function to derive GO prediction from homologous hits that consistently outperform previously proposed scoring functions. These findings enable the improvement of almost all protein function prediction algorithms with a few easily implementable changes in their sequence homolog-based component. This study emphasizes the critical role of search parameter settings in homology-based function transfer and should have an important contribution to the development of future protein function prediction algorithms.

引用

页数：12

共 50 条

[31] NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information
Yao, Shuwei
You, Ronghui
Wang, Shaojun
Xiong, Yi
Huang, Xiaodi
Zhu, Shanfeng
NUCLEIC ACIDS RESEARCH, 2021, 49 (W1) : W469 - W475
[32] NetGO: improving large-scale protein function prediction with massive network information
You, Ronghui
Yao, Shuwei
Xiong, Yi
Huang, Xiaodi
Sun, Fengzhu
Mamitsuka, Hiroshi
Zhu, Shanfeng
NUCLEIC ACIDS RESEARCH, 2019, 47 (W1) : W379 - W387
[33] DeepGraphGO: graph neural network for large-scale, multispecies protein function prediction
You, Ronghui
Yao, Shuwei
Mamitsuka, Hiroshi
Zhu, Shanfeng
BIOINFORMATICS, 2021, 37 : I262 - I271
[34] Tools for Interpreting Large-scale Protein Profiling in Microbiology
Hendrickson, E. L.
Lamont, R. J.
Hackett, M.
JOURNAL OF DENTAL RESEARCH, 2008, 87 (11) : 1004 - 1015
[35] Large-scale model quality assessment for improving protein tertiary structure prediction
Cao, Renzhi
Bhattacharya, Debswapna
Adhikari, Badri
Li, Jilong
Cheng, Jianlin
BIOINFORMATICS, 2015, 31 (12) : 116 - 123
[36] A resource database for protein kinase substrate sequence-preference motifs based on large-scale mass spectrometry data
Poll, Brian G.
Leo, Kirby T.
Deshpande, Venky
Jayatissa, Nipun
Pisitkun, Trairak
Park, Euijung
Yang, Chin-Rang
Raghuram, Viswanathan
Knepper, Mark A.
CELL COMMUNICATION AND SIGNALING, 2024, 22 (01)
[37] A resource database for protein kinase substrate sequence-preference motifs based on large-scale mass spectrometry data
Brian G. Poll
Kirby T. Leo
Venky Deshpande
Nipun Jayatissa
Trairak Pisitkun
Euijung Park
Chin-Rang Yang
Viswanathan Raghuram
Mark A. Knepper
Cell Communication and Signaling, 22
[38] Using homology relations within a database markedly boosts protein sequence similarity search
Tong, Jing
Sadreyev, Ruslan I.
Pei, Jimin
Kinch, Lisa N.
Grishin, Nick V.
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2015, 112 (22) : 7003 - 7008
[39] Recommendation Systems and Their Preference Prediction Algorithms in a Large-Scale Database
Takimoto, Seiji
Hirose, Hideo
INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2009, 12 (05): : 1165 - 1182
[40] Large-Scale Prediction of Human Protein-Protein Interactions from Amino Acid Sequence Based on Latent Topic Features
Pan, Xiao-Yong
Zhang, Ya-Nan
Shen, Hong-Bin
JOURNAL OF PROTEOME RESEARCH, 2010, 9 (10) : 4992 - 5001

← 1 2 3 4 5 →