A large-scale assessment of sequence database search tools for homology-based protein function prediction

被引:4
|
作者
Zhang, Chengxin [1 ]
Freddolino, Lydia [1 ]
机构
[1] Univ Michigan, Dept Computat Med & Bioinformat, Dept Biol Chem, 100 Washtenaw Ave, Ann Arbor, MI 48109 USA
关键词
Gene Ontology; protein function prediction; sequence database search; BLASTp; DIAMOND; MMseqs2; ANNOTATION; GENERATION; ZEBRAFISH;
D O I
10.1093/bib/bbae349
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Sequence database searches followed by homology-based function transfer form one of the oldest and most popular approaches for predicting protein functions, such as Gene Ontology (GO) terms. These searches are also a critical component in most state-of-the-art machine learning and deep learning-based protein function predictors. Although sequence search tools are the basis of homology-based protein function prediction, previous studies have scarcely explored how to select the optimal sequence search tools and configure their parameters to achieve the best function prediction. In this paper, we evaluate the effect of using different options from among popular search tools, as well as the impacts of search parameters, on protein function prediction. When predicting GO terms on a large benchmark dataset, we found that BLASTp and MMseqs2 consistently exceed the performance of other tools, including DIAMOND-one of the most popular tools for function prediction-under default search parameters. However, with the correct parameter settings, DIAMOND can perform comparably to BLASTp and MMseqs2 in function prediction. Additionally, we developed a new scoring function to derive GO prediction from homologous hits that consistently outperform previously proposed scoring functions. These findings enable the improvement of almost all protein function prediction algorithms with a few easily implementable changes in their sequence homolog-based component. This study emphasizes the critical role of search parameter settings in homology-based function transfer and should have an important contribution to the development of future protein function prediction algorithms.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] DockRank: Ranking docked conformations using partner-specific sequence homology-based protein interface prediction
    Xue, Li C.
    Jordan, Rafael A.
    Yasser, EL-Manzalawy
    Dobbs, Drena
    Honavar, Vasant
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2014, 82 (02) : 250 - 267
  • [22] Large-scale prediction of function shift in protein families with a focus on enzymatic function
    Abhiman, S
    Sonnhammer, ELL
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 60 (04) : 758 - 768
  • [23] DeepMFFGO: A Protein Function Prediction Method for Large-Scale Multifeature Fusion
    Wang, Jingfu
    Chen, Jiaying
    Hu, Yue
    Song, Chaolin
    Li, Xinhui
    Qian, Yurong
    Deng, Lei
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2025,
  • [24] Insights into protein function through large-scale computational analysis of sequence and structure
    Weir, M
    Swindells, M
    Overington, J
    TRENDS IN BIOTECHNOLOGY, 2001, 19 (10) : S61 - S66
  • [25] Large-scale intact glycopeptide identification by Mascot database search
    Bollineni, Ravi Chand
    Koehler, Christian Jeffrey
    Gislefoss, Randi Elin
    Anonsen, Jan Haug
    Thiede, Bernd
    SCIENTIFIC REPORTS, 2018, 8
  • [26] Large-scale intact glycopeptide identification by Mascot database search
    Ravi Chand Bollineni
    Christian Jeffrey Koehler
    Randi Elin Gislefoss
    Jan Haug Anonsen
    Bernd Thiede
    Scientific Reports, 8
  • [27] Protein homology model refinement by large-scale energy optimization
    Park, Hahnbeom
    Ovchinnikov, Sergey
    Kim, David E.
    DiMaio, Frank
    Baker, David
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2018, 115 (12) : 3054 - 3059
  • [28] Homology-Based Prediction of Potential Protein-Protein Interactions between Human Erythrocytes and Plasmodium falciparum
    Ramakrishnan, Gayatri
    Srinivasan, Narayanaswamy
    Padmapriya, Ponnan
    Natarajan, Vasant
    BIOINFORMATICS AND BIOLOGY INSIGHTS, 2015, 9 : 195 - 206
  • [29] ERRORS MADE IN AB INITIO plus HOMOLOGY-BASED PROTEIN STRUCTURE PREDICTION TOOLS ARISE DUE TO LACK OF CONSIDERATION OF IMPACT FORCES
    Chander, Aishwarya
    Selvaraj, Priyanka
    CONFERENCE ON DRUG DESIGN AND DISCOVERY TECHNOLOGIES, 2020, 355 : 9 - 12
  • [30] A Sequence-to-Sequence Model for Large-scale Chinese Abbreviation Database Construction
    Wang, Chao
    Liu, Jingping
    Zhuang, Tianyi
    Li, Jiahang
    Liu, Juntao
    Xiao, Yanghua
    Wang, Wei
    Xie, Rui
    WSDM'22: PROCEEDINGS OF THE FIFTEENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2022, : 1063 - 1071