Mining physical protein-protein interactions from the literature

被引:18
|
作者
Huang, Minlie [1 ]
Ding, Shilin [1 ]
Wang, Hongning [1 ]
Zhu, Xiaoyan [1 ]
机构
[1] Tsinghua Univ, State Key Lab Intelligent Technol & Syst, Tsinghua Natl Lab Informat Sci & Technol, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
来源
GENOME BIOLOGY | 2008年 / 9卷
基金
国家高技术研究发展计划(863计划);
关键词
D O I
10.1186/gb-2008-9-S2-S12
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Deciphering physical protein-protein interactions is fundamental to elucidating both the functions of proteins and biological processes. The development of high-throughput experimental technologies such as the yeast two-hybrid screening has produced an explosion in data relating to interactions. Since manual curation is intensive in terms of time and cost, there is an urgent need for text-mining tools to facilitate the extraction of such information. The BioCreative (Critical Assessment of Information Extraction systems in Biology) challenge evaluation provided common standards and shared evaluation criteria to enable comparisons among different approaches. Results: During the benchmark evaluation of BioCreative 2006, all of our results ranked in the top three places. In the task of filtering articles irrelevant to physical protein interactions, our method contributes a precision of 75.07%, a recall of 81.07%, and an AUC (area under the receiver operating characteristic curve) of 0.847. In the task of identifying protein mentions and normalizing mentions to molecule identifiers, our method is competitive among runs submitted, with a precision of 34.83%, a recall of 24.10%, and an F-1 score of 28.5%. In extracting protein interaction pairs, our profile-based method was competitive on the SwissProt-only subset (precision = 36.95%, recall = 32.68%, and F-1 score = 30.40%) and on the entire dataset (30.96%, 29.35%, and 26.20%, respectively). From the biologist's point of view, however, these findings are far from satisfactory. The error analysis presented in this report provides insight into how performance could be improved: three-quarters of false negatives were due to protein normalization problems (532/698), and about one-quarter were due to problems with correctly extracting interactions for this system. Conclusion: We present a text-mining framework to extract physical protein-protein interactions from the literature. Three key issues are addressed, namely filtering irrelevant articles, identifying protein names and normalizing them to molecule identifiers, and extracting protein-protein interactions. Our system is among the top three performers in the benchmark evaluation of BioCreative 2006. The tool will be helpful for manual interaction curation and can greatly facilitate the process of extracting protein-protein interactions.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Mining physical protein-protein interactions from the literature
    Huang M.
    Ding S.
    Wang H.
    Zhu X.
    [J]. Genome Biology, 9 (Suppl 2):
  • [2] Mining literature for protein-protein interactions
    Marcotte, EM
    Xenarios, I
    Eisenberg, D
    [J]. BIOINFORMATICS, 2001, 17 (04) : 359 - 363
  • [3] Mining Impact of Protein Modifications on Protein-Protein Interactions from Literature
    Siu, Amy
    Arighi, Cecilia
    Nchoutmboube, Jules
    Tudor, Catalina O.
    Vijay-Shanker, K.
    Wu, Cathy H.
    [J]. BIBMW: 2009 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOP, 2009, : 343 - 343
  • [4] Mining from protein-protein interactions
    Mamitsuka, Hiroshi
    [J]. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2012, 2 (05) : 400 - 410
  • [5] Mining Protein-Protein Interactions from GeneRIFs with OpenDMAP
    Fox, Andrew D.
    Baumgartner, William A., Jr.
    Johnson, Helen L.
    Hunter, Lawrence E.
    Slonim, Donna K.
    [J]. LINKING LITERATURE, INFORMATION, AND KNOWLEDGE FOR BIOLOGY, 2010, 6004 : 43 - +
  • [6] Mining new protein-protein interactions
    Mamitsuka, H
    [J]. IEEE ENGINEERING IN MEDICINE AND BIOLOGY MAGAZINE, 2005, 24 (03): : 103 - 108
  • [7] Physical protein-protein interactions predicted from microarrays
    Soong, Ta-Tsen
    Wrzeszczynski, Kazimierz O.
    Rost, Burkhard
    [J]. BIOINFORMATICS, 2008, 24 (22) : 2608 - 2614
  • [8] Prediction of physical protein-protein interactions
    Szilágyi, A
    Grimm, V
    Arakaki, AK
    Skolnick, J
    [J]. PHYSICAL BIOLOGY, 2005, 2 (02) : S1 - S16
  • [9] Data mining methods for protein-protein interactions
    Nafar, Zahra
    Golshani, Ashkan
    [J]. 2006 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-5, 2006, : 2090 - +
  • [10] Predicting protein-protein interactions by association mining
    Kotlyar, M
    Jurisica, I
    [J]. INFORMATION SYSTEMS FRONTIERS, 2006, 8 (01) : 37 - 46