Improving protein function prediction methods with integrated literature data

被引:14
|
作者
Gabow, Aaron P. [1 ]
Leach, Sonia M. [1 ,3 ]
Baumgartner, William A. [1 ]
Hunter, Lawrence E. [1 ,2 ]
Goldberg, Debra S. [1 ,2 ]
机构
[1] Univ Colorado, Dept Pharmacol, Denver Hlth Sci Ctr, Aurora, CO 80045 USA
[2] Univ Colorado, Dept Comp Sci, Boulder, CO 80309 USA
[3] Katholieke Univ Leuven, Dept Elect Engn ESAT, Res Div SCD, B-3001 Louvain, Belgium
关键词
D O I
10.1186/1471-2105-9-198
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Determining the function of uncharacterized proteins is a major challenge in the post-genomic era due to the problem's complexity and scale. Identifying a protein's function contributes to an understanding of its role in the involved pathways, its suitability as a drug target, and its potential for protein modifications. Several graph-theoretic approaches predict unidentified functions of proteins by using the functional annotations of better-characterized proteins in protein-protein interaction networks. We systematically consider the use of literature co-occurrence data, introduce a new method for quantifying the reliability of co-occurrence and test how performance differs across species. We also quantify changes in performance as the prediction algorithms annotate with increased specificity. Results: We find that including information on the co-occurrence of proteins within an abstract greatly boosts performance in the Functional Flow graph-theoretic function prediction algorithm in yeast, fly and worm. This increase in performance is not simply due to the presence of additional edges since supplementing protein-protein interactions with co-occurrence data outperforms supplementing with a comparably-sized genetic interaction dataset. Through the combination of protein-protein interactions and co-occurrence data, the neighborhood around unknown proteins is quickly connected to well-characterized nodes which global prediction algorithms can exploit. Our method for quantifying co-occurrence reliability shows superior performance to the other methods, particularly at threshold values around 10% which yield the best trade off between coverage and accuracy. In contrast, the traditional way of asserting co-occurrence when at least one abstract mentions both proteins proves to be the worst method for generating co-occurrence data, introducing too many false positives. Annotating the functions with greater specificity is harder, but co-occurrence data still proves beneficial. Conclusion: Co-occurrence data is a valuable supplemental source for graph-theoretic function prediction algorithms. A rapidly growing literature corpus ensures that co-occurrence data is a readily-available resource for nearly every studied organism, particularly those with small protein interaction databases. Though arguably biased toward known genes, co-occurrence data provides critical additional links to well-studied regions in the interaction network that graph-theoretic function prediction algorithms can exploit.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Multigrid methods for improving the variational data assimilation in numerical weather prediction
    Kang, Youn-Hee
    Kwak, Do Young
    Park, Kyungjeen
    TELLUS SERIES A-DYNAMIC METEOROLOGY AND OCEANOGRAPHY, 2014, 66
  • [22] Improving Cross-Project Defect Prediction Methods with Data Simplification
    Amasaki, Sousuke
    Kawata, Kazuya
    Yokogawa, Tomoyuki
    PROCEEDINGS 41ST EUROMICRO CONFERENCE ON SOFTWARE ENGINEERING AND ADVANCED APPLICATIONS SEAA 2015, 2015, : 96 - 103
  • [23] Improving protein function prediction using protein sequence and GO-term similarities
    Makrodimitris, Stavros
    van Ham, Roeland C. H. J.
    Reinders, Marcel J. T.
    BIOINFORMATICS, 2019, 35 (07) : 1116 - 1124
  • [24] iPFPi: A System for Improving Protein Function Prediction through Cumulative Iterations
    Taha, Kamal
    Yoo, Paul D.
    Alzaabi, Mohammed
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2015, 12 (04) : 825 - 836
  • [25] Improving protein function prediction using the hierarchical structure of the gene ontology
    Eisner, R
    Poulin, B
    Szafron, D
    Lu, P
    Greiner, R
    PROCEEDINGS OF THE 2005 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2005, : 354 - 363
  • [26] An Overview of Protein Function Prediction Methods: A Deep Learning Perspective
    Ispano, Emilio
    Bianca, Federico
    Lavezzo, Enrico
    Toppo, Stefano
    CURRENT BIOINFORMATICS, 2023, 18 (08) : 621 - 630
  • [27] New methods for the prediction of protein structure and function from sequence
    Skolnick, J
    Fetrow, J
    Ortiz, AR
    Kolinski, A
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1999, 217 : U594 - U594
  • [28] Computational prediction of protein interfaces: A review of data driven methods
    Xue, Li C.
    Dobbs, Drena
    Bonvin, Alexandre M. J. J.
    Honavar, Vasant
    FEBS LETTERS, 2015, 589 (23) : 3516 - 3526
  • [29] Assessment of prediction accuracy of protein function from protein-protein interaction data
    Hishigaki, H
    Nakai, K
    Ono, T
    Tanigami, A
    Takagi, T
    YEAST, 2001, 18 (06) : 523 - 531
  • [30] Integrated protein function prediction by mining function associations, sequences, and protein-protein and gene-gene interaction networks
    Cao, Renzhi
    Cheng, Jianlin
    METHODS, 2016, 93 : 84 - 91