Extraction of protein interaction information from unstructured text using a context-free grammar

被引:112
|
作者
Temkin, JM [1 ]
Gilder, MR [1 ]
机构
[1] GE Co, Global Res, Niskayuna, NY 12309 USA
关键词
D O I
10.1093/bioinformatics/btg279
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: As research into disease pathology and cellular function continues to generate vast amounts of data pertaining to protein, gene and small molecule (PGSM) interactions, there exists a critical need to capture these results in structured formats allowing for computational analysis. Although many efforts have been made to create databases that store this information in computer readable form, populating these sources largely requires a manual process of interpreting and extracting interaction relationships from the biological research literature. Being able to efficiently and accurately automate the extraction of interactions from unstructured text, would greatly improve the content of these databases and provide a method for managing the continued growth of new literature being published. Results: In this paper, we describe a system for extracting PGSM interactions from unstructured text. By utilizing a lexical analyzer and context free grammar (CFG), we demonstrate that efficient parsers can be constructed for extracting these relationships from natural language with high rates of recall and precision. Our results show that this technique achieved a recall rate of 83.5% and a precision rate of 93.1% for recognizing PGSM names and a recall rate of 63.9% and a precision rate of 70.2% for extracting interactions between these entities. In contrast to other published techniques, the use of a CFG significantly reduces the complexities of natural language processing by focusing on domain specific structure as opposed to analyzing the semantics of a given language. Additionally, our approach provides a level of abstraction for adding new rules for extracting other types of biological relationships beyond PGSM relationships.
引用
收藏
页码:2046 / 2053
页数:8
相关论文
共 50 条
  • [1] Extraction of protein interaction information from unstructured text using a link grammar parser
    Seoud, Rania A. Abul
    Youssef, Abou-Bakr M.
    Kadah, Yasser M.
    2007 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS: ICCES '07, 2007, : 70 - +
  • [2] Representing the Unification of Text Featurization using a Context-Free Grammar
    Kilitcioglu, Doruk
    Kadioglu, Serdar
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 15439 - 15445
  • [3] Classifying protein-protein interaction articles from biomedical literature using many relevant features and context-free grammar
    Abdulkadhar, Sabenabanu
    Murugesan, Gurusamy
    Natarajan, Jeyakumar
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2020, 32 (05) : 553 - 560
  • [4] UNIVERSAL CONTEXT-FREE GRAMMAR
    KASAI, T
    INFORMATION AND CONTROL, 1975, 28 (01): : 30 - 34
  • [5] CONTEXT-FREE GRAMMAR FORMS
    CREMERS, A
    GINSBURG, S
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1975, 11 (01) : 86 - 117
  • [6] Parallel Contextual Array Insertion Deletion Grammar and (Context-Free : Context-Free) Matrix Grammar
    Jayasankar, S.
    Thomas, D. G.
    Immanuel, S. James
    Paramasivan, Meenakshi
    Robinson, T.
    Nagar, Atulya K.
    COMBINATORIAL IMAGE ANALYSIS, IWCIA 2020, 2020, 12148 : 147 - 163
  • [7] Grammar compression with probabilistic context-free grammar
    Naganuma, Hiroaki
    Hendrian, Diptarama
    Yoshinaka, Ryo
    Shinohara, Ayumi
    Kobayashi, Naoki
    2020 DATA COMPRESSION CONFERENCE (DCC 2020), 2020, : 386 - 386
  • [8] Human behavior recognition using a context-free grammar
    Rosani, Andrea
    Conci, Nicola
    De Natale, Francesco G. B.
    JOURNAL OF ELECTRONIC IMAGING, 2014, 23 (03)
  • [9] ON THE HOTZ GROUP OF A CONTEXT-FREE GRAMMAR
    FROUGNY, C
    SAKAROVITCH, J
    VALKEMA, E
    ACTA INFORMATICA, 1982, 18 (01) : 109 - 115
  • [10] The Polytope of Context-Free Grammar Constraints
    Pesant, Gilles
    Quimper, Claude-Guy
    Rousseau, Louis-Martin
    Sellmann, Meinolf
    INTEGRATION OF AI AND OR TECHNIQUES IN CONSTRAINT PROGRAMMING FOR COMBINATORIAL OPTIMIZATION PROBLEMS, PROCEEDINGS, 2009, 5547 : 223 - +