Detection of Protein Subcellular Localization based on a Full Syntactic Parser and Semantic Information

被引:2
|
作者
Kim, Mi-Young [1 ]
机构
[1] Sungshin Womens Univ, Sch Engn & Comp Sci, Seoul 136742, South Korea
关键词
D O I
10.1109/FSKD.2008.529
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A protein's subcellular localization is considered an essential part of the description of its associated biomolecular phenomena. As the volume of biomolecular reports has increased, there has been a great deal of research on text mining to detect protein subcellular localization information in documents. It has been argued that linguistic information, especially syntactic information, is useful for identifying the subcellular localizations of proteins of interest. However, previous systems for detecting protein subcellular localization information used only shallow syntactic parsers, and showed poor performance. Thus, there remains a need to use a full syntactic parser and to apply deep linguistic knowledge to the analysis of text for protein subcellular localization information. In addition, we have attempted to use semantic information from the WordNet thesaurus. To improve performance in detecting protein subcellular localization information, this paper proposes a three-step method based on a full syntactic dependency parser and semantic information. In the first step, we construct syntactic dependency paths from each protein to its location candidate. In the second step, we retrieve root information of the syntactic dependency paths. In the final step, we extract syn-semantic patterns of protein subtrees and location subtrees. From the root and subtree nodes, we extract syntactic category and syntactic direction as syntactic information, and synset offset of the WordNet thesaurus as semantic information. According to the root information and syn-semantic patterns of subtrees, we extract (protein, localization) pairs. Even with no biomolecular knowledge, our method shows reasonable performance in experimental results using Medline abstract data. In fact, our proposed method gave an F-measure of 74.53% for training data and 58.90% for test data, significantly outperforming previous methods, by 12-25%.
引用
收藏
页码:407 / 411
页数:5
相关论文
共 50 条
  • [1] Protein Modules Detection Based on Subcellular Information
    Yu, Yang
    Lin, Lei
    Sun, Chengjie
    Wang, Xiaolong
    Wang, Xuan
    [J]. CURRENT BIOINFORMATICS, 2013, 8 (03) : 293 - 298
  • [2] Protein Subcellular Localization Based on Evolutionary Information and Segmented Distribution
    Jin, Danyu
    Zhu, Ping
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021
  • [3] Protein Subcellular Localization Based on Evolutionary Information and Segmented Distribution
    Jin, Danyu
    Zhu, Ping
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021
  • [4] Detection of Protein Interactions by Subcellular Localization Assay
    Mocanu, Maria-Magdalena
    Nagy, Peter
    Szollosi, Janos
    [J]. CYTOMETRY PART A, 2017, 91A (07) : 657 - 658
  • [5] Predicting protein subcellular localization based on information content of gene ontology terms
    Zhang, Shu-Bo
    Tang, Qiang-Rong
    [J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2016, 65 : 1 - 7
  • [6] Clickbait detection on WeChat: A deep model integrating semantic and syntactic information
    Liu, Tong
    Yu, Ke
    Wang, Lu
    Zhang, Xuanyu
    Zhou, Hao
    Wu, Xiaofei
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 245
  • [7] Introducing syntax information in a stochastically-based semantic case grammar parser
    Minker W.
    [J]. International Journal of Speech Technology, 2004, 7 (1) : 45 - 54
  • [8] Measuring the short text similarity based on semantic and syntactic information
    Yang, Jiaqi
    Li, Yongjun
    Gao, Congjie
    Zhang, Yinyin
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2021, 114 : 169 - 180
  • [9] Protein and its Function Based on a Subcellular Localization
    Cmielova, Jana
    Rezacova, M.
    [J]. JOURNAL OF CELLULAR BIOCHEMISTRY, 2011, 112 (12) : 3502 - 3506
  • [10] Extracting Human Protein Information from MEDLINE Using a Full-Sentence Parser
    Busa-Fekete, Robert
    Koesor, Andras
    [J]. ACTA CYBERNETICA, 2008, 18 (03): : 391 - 402