Improving protein function prediction methods with integrated literature data

被引：14

作者：

Gabow, Aaron P. ^{[1
]}

Leach, Sonia M. ^{[1
,3
]}

Baumgartner, William A. ^{[1
]}

Hunter, Lawrence E. ^{[1
,2
]}

Goldberg, Debra S. ^{[1
,2
]}

机构：

[1] Univ Colorado, Dept Pharmacol, Denver Hlth Sci Ctr, Aurora, CO 80045 USA

[2] Univ Colorado, Dept Comp Sci, Boulder, CO 80309 USA

[3] Katholieke Univ Leuven, Dept Elect Engn ESAT, Res Div SCD, B-3001 Louvain, Belgium

来源：

BMC BIOINFORMATICS | 2008年 / 9卷 / 1期

关键词：

D O I：

10.1186/1471-2105-9-198

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Background: Determining the function of uncharacterized proteins is a major challenge in the post-genomic era due to the problem's complexity and scale. Identifying a protein's function contributes to an understanding of its role in the involved pathways, its suitability as a drug target, and its potential for protein modifications. Several graph-theoretic approaches predict unidentified functions of proteins by using the functional annotations of better-characterized proteins in protein-protein interaction networks. We systematically consider the use of literature co-occurrence data, introduce a new method for quantifying the reliability of co-occurrence and test how performance differs across species. We also quantify changes in performance as the prediction algorithms annotate with increased specificity. Results: We find that including information on the co-occurrence of proteins within an abstract greatly boosts performance in the Functional Flow graph-theoretic function prediction algorithm in yeast, fly and worm. This increase in performance is not simply due to the presence of additional edges since supplementing protein-protein interactions with co-occurrence data outperforms supplementing with a comparably-sized genetic interaction dataset. Through the combination of protein-protein interactions and co-occurrence data, the neighborhood around unknown proteins is quickly connected to well-characterized nodes which global prediction algorithms can exploit. Our method for quantifying co-occurrence reliability shows superior performance to the other methods, particularly at threshold values around 10% which yield the best trade off between coverage and accuracy. In contrast, the traditional way of asserting co-occurrence when at least one abstract mentions both proteins proves to be the worst method for generating co-occurrence data, introducing too many false positives. Annotating the functions with greater specificity is harder, but co-occurrence data still proves beneficial. Conclusion: Co-occurrence data is a valuable supplemental source for graph-theoretic function prediction algorithms. A rapidly growing literature corpus ensures that co-occurrence data is a readily-available resource for nearly every studied organism, particularly those with small protein interaction databases. Though arguably biased toward known genes, co-occurrence data provides critical additional links to well-studied regions in the interaction network that graph-theoretic function prediction algorithms can exploit.

引用

页数：16

共 50 条

[21] Multigrid methods for improving the variational data assimilation in numerical weather prediction
Kang, Youn-Hee
Kwak, Do Young
Park, Kyungjeen
TELLUS SERIES A-DYNAMIC METEOROLOGY AND OCEANOGRAPHY, 2014, 66
[22] Improving Cross-Project Defect Prediction Methods with Data Simplification
Amasaki, Sousuke
Kawata, Kazuya
Yokogawa, Tomoyuki
PROCEEDINGS 41ST EUROMICRO CONFERENCE ON SOFTWARE ENGINEERING AND ADVANCED APPLICATIONS SEAA 2015, 2015, : 96 - 103
[23] Improving protein function prediction using protein sequence and GO-term similarities
Makrodimitris, Stavros
van Ham, Roeland C. H. J.
Reinders, Marcel J. T.
BIOINFORMATICS, 2019, 35 (07) : 1116 - 1124
[24] iPFPi: A System for Improving Protein Function Prediction through Cumulative Iterations
Taha, Kamal
Yoo, Paul D.
Alzaabi, Mohammed
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2015, 12 (04) : 825 - 836
[25] Improving protein function prediction using the hierarchical structure of the gene ontology
Eisner, R
Poulin, B
Szafron, D
Lu, P
Greiner, R
PROCEEDINGS OF THE 2005 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2005, : 354 - 363
[26] An Overview of Protein Function Prediction Methods: A Deep Learning Perspective
Ispano, Emilio
Bianca, Federico
Lavezzo, Enrico
Toppo, Stefano
CURRENT BIOINFORMATICS, 2023, 18 (08) : 621 - 630
[27] New methods for the prediction of protein structure and function from sequence
Skolnick, J
Fetrow, J
Ortiz, AR
Kolinski, A
ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1999, 217 : U594 - U594
[28] Computational prediction of protein interfaces: A review of data driven methods
Xue, Li C.
Dobbs, Drena
Bonvin, Alexandre M. J. J.
Honavar, Vasant
FEBS LETTERS, 2015, 589 (23) : 3516 - 3526
[29] Assessment of prediction accuracy of protein function from protein-protein interaction data
Hishigaki, H
Nakai, K
Ono, T
Tanigami, A
Takagi, T
YEAST, 2001, 18 (06) : 523 - 531
[30] Integrated protein function prediction by mining function associations, sequences, and protein-protein and gene-gene interaction networks
Cao, Renzhi
Cheng, Jianlin
METHODS, 2016, 93 : 84 - 91

← 1 2 3 4 5 →