A large-scale assessment of sequence database search tools for homology-based protein function prediction

被引:4
|
作者
Zhang, Chengxin [1 ]
Freddolino, Lydia [1 ]
机构
[1] Univ Michigan, Dept Computat Med & Bioinformat, Dept Biol Chem, 100 Washtenaw Ave, Ann Arbor, MI 48109 USA
关键词
Gene Ontology; protein function prediction; sequence database search; BLASTp; DIAMOND; MMseqs2; ANNOTATION; GENERATION; ZEBRAFISH;
D O I
10.1093/bib/bbae349
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Sequence database searches followed by homology-based function transfer form one of the oldest and most popular approaches for predicting protein functions, such as Gene Ontology (GO) terms. These searches are also a critical component in most state-of-the-art machine learning and deep learning-based protein function predictors. Although sequence search tools are the basis of homology-based protein function prediction, previous studies have scarcely explored how to select the optimal sequence search tools and configure their parameters to achieve the best function prediction. In this paper, we evaluate the effect of using different options from among popular search tools, as well as the impacts of search parameters, on protein function prediction. When predicting GO terms on a large benchmark dataset, we found that BLASTp and MMseqs2 consistently exceed the performance of other tools, including DIAMOND-one of the most popular tools for function prediction-under default search parameters. However, with the correct parameter settings, DIAMOND can perform comparably to BLASTp and MMseqs2 in function prediction. Additionally, we developed a new scoring function to derive GO prediction from homologous hits that consistently outperform previously proposed scoring functions. These findings enable the improvement of almost all protein function prediction algorithms with a few easily implementable changes in their sequence homolog-based component. This study emphasizes the critical role of search parameter settings in homology-based function transfer and should have an important contribution to the development of future protein function prediction algorithms.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Large-scale prediction of protein structure and function from sequence
    Tosatto, S. C. E.
    Toppo, S.
    CURRENT PHARMACEUTICAL DESIGN, 2006, 12 (17) : 2067 - 2086
  • [2] Homology-based inference sets the bar high for protein function prediction
    Tobias Hamp
    Rebecca Kassner
    Stefan Seemayer
    Esmeralda Vicedo
    Christian Schaefer
    Dominik Achten
    Florian Auer
    Ariane Boehm
    Tatjana Braun
    Maximilian Hecht
    Mark Heron
    Peter Hönigschmid
    Thomas A Hopf
    Stefanie Kaufmann
    Michael Kiening
    Denis Krompass
    Cedric Landerer
    Yannick Mahlich
    Manfred Roos
    Burkhard Rost
    BMC Bioinformatics, 14
  • [3] Homology-based inference sets the bar high for protein function prediction
    Hamp, Tobias
    Kassner, Rebecca
    Seemayer, Stefan
    Vicedo, Esmeralda
    Schaefer, Christian
    Achten, Dominik
    Auer, Florian
    Boehm, Ariane
    Braun, Tatjana
    Hecht, Maximilian
    Heron, Mark
    Hoenigschmid, Peter
    Hopf, Thomas A.
    Kaufmann, Stefanie
    Kiening, Michael
    Krompass, Denis
    Landerer, Cedric
    Mahlich, Yannick
    Roos, Manfred
    Rost, Burkhard
    BMC BIOINFORMATICS, 2013, 14
  • [4] Protein function annotation by homology-based inference
    Loewenstein, Yaniv
    Raimondo, Domenico
    Redfern, Oliver C.
    Watson, James
    Frishman, Dmitrij
    Linial, Michal
    Orengo, Christine
    Thornton, Janet
    Tramontano, Anna
    GENOME BIOLOGY, 2009, 10 (02): : 207
  • [5] SWhybrid: A Hybrid-Parallel Framework for Large-Scale Protein Sequence Database Search
    Lan, Haidong
    Liu, Weiguo
    Liu, Yongchao
    Schmidt, Bertil
    2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, : 42 - 51
  • [6] A large-scale protein-function database
    Rolf Apweiler
    Richard Armstrong
    Amos Bairoch
    Athel Cornish-Bowden
    Peter J Halling
    Jan-Hendrik S Hofmeyr
    Carsten Kettner
    Thomas S Leyh
    Johann Rohwer
    Dietmar Schomburg
    Christoph Steinbeck
    Keith Tipton
    Nature Chemical Biology, 2010, 6 : 785 - 785
  • [7] A large-scale protein-function database
    Apweiler, Rolf
    Armstrong, Richard
    Bairoch, Amos
    Cornish-Bowden, Athel
    Halling, Peter J.
    Hofmeyr, Jan-Hendrik S.
    Kettner, Carsten
    Leyh, Thomas S.
    Rohwer, Johann
    Schomburg, Dietmar
    Steinbeck, Christoph
    Tipton, Keith
    NATURE CHEMICAL BIOLOGY, 2010, 6 (11) : 785 - 785
  • [8] Protein function annotation by homology-based inference
    Yaniv Loewenstein
    Domenico Raimondo
    Oliver C Redfern
    James Watson
    Dmitrij Frishman
    Michal Linial
    Christine Orengo
    Janet Thornton
    Anna Tramontano
    Genome Biology, 10
  • [9] GOLabeler: improving sequence-based large-scale protein function prediction by learning to rank
    You, Ronghui
    Zhang, Zihan
    Xiong, Yi
    Sun, Fengzhu
    Mamitsuka, Hiroshi
    Zhu, Shanfeng
    BIOINFORMATICS, 2018, 34 (14) : 2465 - 2473
  • [10] Fast and accurate protein function prediction from sequence through pretrained language model and homology-based label diffusion
    Yuan, Qianmu
    Xie, Junjie
    Xie, Jiancong
    Zhao, Huiying
    Yang, Yuedong
    BRIEFINGS IN BIOINFORMATICS, 2023, 24 (03)