Mining physical protein-protein interactions from the literature

被引:18
|
作者
Huang, Minlie [1 ]
Ding, Shilin [1 ]
Wang, Hongning [1 ]
Zhu, Xiaoyan [1 ]
机构
[1] Tsinghua Univ, State Key Lab Intelligent Technol & Syst, Tsinghua Natl Lab Informat Sci & Technol, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
来源
GENOME BIOLOGY | 2008年 / 9卷
基金
国家高技术研究发展计划(863计划);
关键词
D O I
10.1186/gb-2008-9-S2-S12
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Deciphering physical protein-protein interactions is fundamental to elucidating both the functions of proteins and biological processes. The development of high-throughput experimental technologies such as the yeast two-hybrid screening has produced an explosion in data relating to interactions. Since manual curation is intensive in terms of time and cost, there is an urgent need for text-mining tools to facilitate the extraction of such information. The BioCreative (Critical Assessment of Information Extraction systems in Biology) challenge evaluation provided common standards and shared evaluation criteria to enable comparisons among different approaches. Results: During the benchmark evaluation of BioCreative 2006, all of our results ranked in the top three places. In the task of filtering articles irrelevant to physical protein interactions, our method contributes a precision of 75.07%, a recall of 81.07%, and an AUC (area under the receiver operating characteristic curve) of 0.847. In the task of identifying protein mentions and normalizing mentions to molecule identifiers, our method is competitive among runs submitted, with a precision of 34.83%, a recall of 24.10%, and an F-1 score of 28.5%. In extracting protein interaction pairs, our profile-based method was competitive on the SwissProt-only subset (precision = 36.95%, recall = 32.68%, and F-1 score = 30.40%) and on the entire dataset (30.96%, 29.35%, and 26.20%, respectively). From the biologist's point of view, however, these findings are far from satisfactory. The error analysis presented in this report provides insight into how performance could be improved: three-quarters of false negatives were due to protein normalization problems (532/698), and about one-quarter were due to problems with correctly extracting interactions for this system. Conclusion: We present a text-mining framework to extract physical protein-protein interactions from the literature. Three key issues are addressed, namely filtering irrelevant articles, identifying protein names and normalizing them to molecule identifiers, and extracting protein-protein interactions. Our system is among the top three performers in the benchmark evaluation of BioCreative 2006. The tool will be helpful for manual interaction curation and can greatly facilitate the process of extracting protein-protein interactions.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Efficient mining from heterogeneous data sets for predicting protein-protein interactions
    Mamitsuka, H
    [J]. 14TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2003, : 32 - 36
  • [22] PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine
    Ian Donaldson
    Joel Martin
    Berry de Bruijn
    Cheryl Wolting
    Vicki Lay
    Brigitte Tuekam
    Shudong Zhang
    Berivan Baskin
    Gary D Bader
    Katerina Michalickova
    Tony Pawson
    Christopher WV Hogue
    [J]. BMC Bioinformatics, 4
  • [23] Integrating protein-protein interactions and text mining for protein function prediction
    Jaeger, Samira
    Gaudan, Sylvain
    Leser, Ulf
    Rebholz-Schuhmann, Dietrich
    [J]. BMC BIOINFORMATICS, 2008, 9 (Suppl 8)
  • [24] Integrating protein-protein interactions and text mining for protein function prediction
    Samira Jaeger
    Sylvain Gaudan
    Ulf Leser
    Dietrich Rebholz-Schuhmann
    [J]. BMC Bioinformatics, 9
  • [25] Document classification for mining host pathogen protein-protein interactions
    Yin, Lanlan
    Xu, Guixian
    Torii, Manabu
    Niu, Zhendong
    Maisog, Jose M.
    Wu, Cathy
    Hu, Zhangzhi
    Liu, Hongfang
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2010, 49 (03) : 155 - 160
  • [26] PPI Finder: A Mining Tool for Human Protein-Protein Interactions
    He, Min
    Wang, Yi
    Li, Wei
    [J]. PLOS ONE, 2009, 4 (02):
  • [27] DAPPER: a data-mining resource for protein-protein interactions
    Haider, Syed
    Lipinszki, Zoltan
    Przewloka, Marcin R.
    Ladak, Yaseen
    D'Avino, Pier Paolo
    Kimata, Yuu
    Lio, Pietro
    Glover, David M.
    [J]. BIODATA MINING, 2015, 8
  • [28] DAPPER: a data-mining resource for protein-protein interactions
    Syed Haider
    Zoltan Lipinszki
    Marcin R. Przewloka
    Yaseen Ladak
    Pier Paolo D’Avino
    Yuu Kimata
    Pietro Lio’
    David M. Glover
    [J]. BioData Mining, 8
  • [29] Document Classification for Mining Host Pathogen Protein-Protein Interactions
    Xu, Guixian
    Yin, Lanlan
    Torii, Manabu
    Niu, Zhendong
    Wu, Cathy
    Hu, Zhangzhi
    Liu, Hongfang
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, PROCEEDINGS, 2008, : 461 - +
  • [30] A Comprehensive Benchmark of Kernel Methods to Extract Protein-Protein Interactions from Literature
    Tikk, Domonkos
    Thomas, Philippe
    Palaga, Peter
    Hakenberg, Joerg
    Leser, Ulf
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2010, 6 (07) : 32