Mining physical protein-protein interactions from the literature

被引:18
|
作者
Huang, Minlie [1 ]
Ding, Shilin [1 ]
Wang, Hongning [1 ]
Zhu, Xiaoyan [1 ]
机构
[1] Tsinghua Univ, State Key Lab Intelligent Technol & Syst, Tsinghua Natl Lab Informat Sci & Technol, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
来源
GENOME BIOLOGY | 2008年 / 9卷
基金
国家高技术研究发展计划(863计划);
关键词
D O I
10.1186/gb-2008-9-S2-S12
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Deciphering physical protein-protein interactions is fundamental to elucidating both the functions of proteins and biological processes. The development of high-throughput experimental technologies such as the yeast two-hybrid screening has produced an explosion in data relating to interactions. Since manual curation is intensive in terms of time and cost, there is an urgent need for text-mining tools to facilitate the extraction of such information. The BioCreative (Critical Assessment of Information Extraction systems in Biology) challenge evaluation provided common standards and shared evaluation criteria to enable comparisons among different approaches. Results: During the benchmark evaluation of BioCreative 2006, all of our results ranked in the top three places. In the task of filtering articles irrelevant to physical protein interactions, our method contributes a precision of 75.07%, a recall of 81.07%, and an AUC (area under the receiver operating characteristic curve) of 0.847. In the task of identifying protein mentions and normalizing mentions to molecule identifiers, our method is competitive among runs submitted, with a precision of 34.83%, a recall of 24.10%, and an F-1 score of 28.5%. In extracting protein interaction pairs, our profile-based method was competitive on the SwissProt-only subset (precision = 36.95%, recall = 32.68%, and F-1 score = 30.40%) and on the entire dataset (30.96%, 29.35%, and 26.20%, respectively). From the biologist's point of view, however, these findings are far from satisfactory. The error analysis presented in this report provides insight into how performance could be improved: three-quarters of false negatives were due to protein normalization problems (532/698), and about one-quarter were due to problems with correctly extracting interactions for this system. Conclusion: We present a text-mining framework to extract physical protein-protein interactions from the literature. Three key issues are addressed, namely filtering irrelevant articles, identifying protein names and normalizing them to molecule identifiers, and extracting protein-protein interactions. Our system is among the top three performers in the benchmark evaluation of BioCreative 2006. The tool will be helpful for manual interaction curation and can greatly facilitate the process of extracting protein-protein interactions.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Protein-Protein Interactions
    Netterwald, James
    [J]. GENETIC ENGINEERING & BIOTECHNOLOGY NEWS, 2010, 30 (05): : 1 - +
  • [42] Protein-protein interactions
    Chene, Patrick
    [J]. DRUGS OF THE FUTURE, 2007, 32 : 3 - 3
  • [43] PROTEIN-PROTEIN INTERACTIONS
    WAUGH, DF
    [J]. ADVANCES IN PROTEIN CHEMISTRY, 1954, 9 : 325 - 437
  • [44] Protein-protein interactions
    Netterwald, James
    [J]. Genetic Engineering and Biotechnology News, 2010, 30 (05):
  • [45] Identification of hot regions in protein-protein interactions by sequential pattern mining
    Chen-Ming Hsu
    Chien-Yu Chen
    Baw-Jhiune Liu
    Chih-Chang Huang
    Min-Hung Laio
    Chien-Chieh Lin
    Tzung-Lin Wu
    [J]. BMC Bioinformatics, 8
  • [46] Identification of hot regions in protein-protein interactions by sequential pattern mining
    Hsu, Chen-Ming
    Chen, Chien-Yu
    Liu, Baw-Jhiune
    Huang, Chih-Chang
    Laio, Min-Hung
    Lin, Chien-Chieh
    Wu, Tzung-Lin
    [J]. BMC BIOINFORMATICS, 2007, 8 (Suppl 5)
  • [47] HIME: Mining and Ensembling Heterogeneous Information for Protein-Protein Interactions Prediction
    Chen, Huaming
    Jin, Yaochu
    Wang, Lei
    Chi, Chi-Hung
    Shen, Jun
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [48] A Parallel and Distributed Computing System for Protein-Protein Interaction Literature Mining
    Lee, Hsi-Chieh
    Huang, Szu-Wei
    [J]. CURRENT PROTEOMICS, 2018, 15 (05) : 344 - 351
  • [49] A Hybrid Deep Learning Model for Protein-Protein Interactions Extraction from Biomedical Literature
    Quan, Changqin
    Luo, Zhiwei
    Wang, Song
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (08):
  • [50] Extracting protein-protein interactions from the literature using the hidden vector state model
    Zhou, Deyu
    He, Yulan
    Kwoh, Chee Keong
    [J]. COMPUTATIONAL SCIENCE - ICCS 2006, PT 2, PROCEEDINGS, 2006, 3992 : 718 - 725