Improving protein function prediction methods with integrated literature data

被引:14
|
作者
Gabow, Aaron P. [1 ]
Leach, Sonia M. [1 ,3 ]
Baumgartner, William A. [1 ]
Hunter, Lawrence E. [1 ,2 ]
Goldberg, Debra S. [1 ,2 ]
机构
[1] Univ Colorado, Dept Pharmacol, Denver Hlth Sci Ctr, Aurora, CO 80045 USA
[2] Univ Colorado, Dept Comp Sci, Boulder, CO 80309 USA
[3] Katholieke Univ Leuven, Dept Elect Engn ESAT, Res Div SCD, B-3001 Louvain, Belgium
关键词
D O I
10.1186/1471-2105-9-198
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Determining the function of uncharacterized proteins is a major challenge in the post-genomic era due to the problem's complexity and scale. Identifying a protein's function contributes to an understanding of its role in the involved pathways, its suitability as a drug target, and its potential for protein modifications. Several graph-theoretic approaches predict unidentified functions of proteins by using the functional annotations of better-characterized proteins in protein-protein interaction networks. We systematically consider the use of literature co-occurrence data, introduce a new method for quantifying the reliability of co-occurrence and test how performance differs across species. We also quantify changes in performance as the prediction algorithms annotate with increased specificity. Results: We find that including information on the co-occurrence of proteins within an abstract greatly boosts performance in the Functional Flow graph-theoretic function prediction algorithm in yeast, fly and worm. This increase in performance is not simply due to the presence of additional edges since supplementing protein-protein interactions with co-occurrence data outperforms supplementing with a comparably-sized genetic interaction dataset. Through the combination of protein-protein interactions and co-occurrence data, the neighborhood around unknown proteins is quickly connected to well-characterized nodes which global prediction algorithms can exploit. Our method for quantifying co-occurrence reliability shows superior performance to the other methods, particularly at threshold values around 10% which yield the best trade off between coverage and accuracy. In contrast, the traditional way of asserting co-occurrence when at least one abstract mentions both proteins proves to be the worst method for generating co-occurrence data, introducing too many false positives. Annotating the functions with greater specificity is harder, but co-occurrence data still proves beneficial. Conclusion: Co-occurrence data is a valuable supplemental source for graph-theoretic function prediction algorithms. A rapidly growing literature corpus ensures that co-occurrence data is a readily-available resource for nearly every studied organism, particularly those with small protein interaction databases. Though arguably biased toward known genes, co-occurrence data provides critical additional links to well-studied regions in the interaction network that graph-theoretic function prediction algorithms can exploit.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Improving protein function prediction methods with integrated literature data
    Aaron P Gabow
    Sonia M Leach
    William A Baumgartner
    Lawrence E Hunter
    Debra S Goldberg
    BMC Bioinformatics, 9
  • [2] Methods for improving protein disorder prediction
    Vucetic, S
    Radivojac, P
    Obradovic, Z
    Brown, CJ
    Dunker, AK
    IJCNN'01: INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2001, : 2718 - 2723
  • [3] Improving Protein Function Prediction by Adaptively Fusing Information From Protein Sequences and Biomedical Literature
    Zhao, Yingwen
    Yang, Zhihao
    Hong, Yongkai
    Wang, Lei
    Zhang, Yin
    Lin, Hongfei
    Wang, Jian
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2023, 27 (02) : 1140 - 1148
  • [4] ANN Based Protein Function Prediction Using Integrated Protein-Protein Interaction Data
    Shi, Lei
    Cho, Young-Rae
    Zhang, Aidong
    2009 INTERNATIONAL JOINT CONFERENCE ON BIOINFORMATICS, SYSTEMS BIOLOGY AND INTELLIGENT COMPUTING, PROCEEDINGS, 2009, : 271 - 277
  • [5] Computational Models or Methods for Protein Function Prediction
    Huang, Guohua
    CURRENT PROTEOMICS, 2019, 16 (05) : 352 - 353
  • [6] Deep learning methods for protein function prediction
    Boadu, Frimpong
    Lee, Ahhyun
    Cheng, Jianlin
    PROTEOMICS, 2025, 25 (1-2)
  • [7] Improving protein function prediction by learning and integrating representations of protein sequences and function labels
    Boadu, Frimpong
    Cheng, Jianlin
    BIOINFORMATICS ADVANCES, 2024, 4 (01):
  • [8] PSiFR: an integrated resource for prediction of protein structure and function
    Pandit, Shashi B.
    Brylinski, Michal
    Zhou, Hongyi
    Gao, Mu
    Arakaki, Adrian K.
    Skolnick, Jeffrey
    BIOINFORMATICS, 2010, 26 (05) : 687 - 688
  • [9] An Approach for Data Selection of Protein Function Prediction
    Liao, Bo
    Liu, Qinfeng
    Zeng, Qingguang
    Luo, Jiawei
    Yue, Guanxue
    MATCH-COMMUNICATIONS IN MATHEMATICAL AND IN COMPUTER CHEMISTRY, 2011, 65 (02) : 459 - 468
  • [10] Data Mining Framework for Protein Function Prediction
    Rahman, Shuzlina Abdul
    Hussein, Zeti Azura Mohamed
    Abu Bakar, Azuraliza
    INTERNATIONAL SYMPOSIUM OF INFORMATION TECHNOLOGY 2008, VOLS 1-4, PROCEEDINGS: COGNITIVE INFORMATICS: BRIDGING NATURAL AND ARTIFICIAL KNOWLEDGE, 2008, : 1009 - +