Protein Name Recognition Based on Dictionary Mining and Heuristics

被引:0
|
作者
Lin, Shian-Hua [1 ]
Ding, Shao-Hong [1 ]
Zeng, Wei-Sheng [1 ]
机构
[1] Natl Chi Nan Univ, Dept Comp Sci & Informat Engn, Puli 545, Nantou Hsien, Taiwan
关键词
protein name recognition; association mining; dictionary mining; heuristics; GENE; TEXT; IDENTIFICATION; PATTERNS;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We propose a novel method that integrates dictionary, heuristics and data mining approaches to efficiently and effectively recognize exact protein names from the literature. According to the protein name dictionary and heuristic rules published in related studies, core tokens of protein names can be efficiently detected. However, exact boundaries of protein names are hard to be identified. By regarding tokens of a protein name as items within a transaction, we apply mining associations to discover significant sequential patterns (SSPs) from the protein name dictionary. Based on SSPs, protein name parts are extended from core tokens to left and right boundaries for correctly recognizing the protein name. Based on Yapex101 corpus, Protein Name Recognition System (PNRS) achieves the F-score (74.49%) better than existing systems and papers.
引用
收藏
页码:75 / 87
页数:13
相关论文
共 50 条
  • [1] An approach to protein name extraction using heuristics and a dictionary
    Seki, K
    Mostafa, J
    [J]. ASIST 2003: PROCEEDINGS OF THE 66TH ASIST ANNUAL MEETING, VOL 40, 2003: HUMANIZING INFORMATION TECHNOLOGY: FROM IDEAS TO BITS AND BACK, 2003, 40 : 71 - 77
  • [2] Improving the performance of dictionary-based approaches in protein name recognition
    Tsuruoka, Y
    Tsujii, J
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2004, 37 (06) : 461 - 470
  • [3] Gene/protein name recognition based on support vector machine using dictionary as features
    Tomohiro Mitsumori
    Sevrani Fation
    Masaki Murata
    Kouichi Doi
    Hirohumi Doi
    [J]. BMC Bioinformatics, 6
  • [4] Gene/protein name recognition based on support vector machine using dictionary as features
    Mitsumori, T
    Fation, S
    Murata, M
    Doi, K
    Doi, H
    [J]. BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
  • [5] Association rules mining for name entity recognition
    Budi, I
    Bressan, S
    [J]. FOURTH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS ENGINEERING, PROCEEDINGS, 2003, : 325 - 328
  • [6] Research of Drug Name Entity Recognition Based on Constructed Dictionary and Conditional Random Field
    Zhu, Xun
    Deng, Hongtao
    [J]. MATERIALS SCIENCE AND PROCESSING, ENVIRONMENTAL ENGINEERING AND INFORMATION TECHNOLOGIES, 2014, 665 : 739 - 744
  • [7] Translation of English-Chinese Person Name Based on Dictionary, Bilingual Corpus and Web Mining
    Liu, Ying
    Xiao, TianJiu
    [J]. 2014 10TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION (ICNC), 2014, : 818 - 822
  • [8] Exploiting the performance of dictionary-based bio-entity name recognition in biomedical literature
    Yang, Zhihao
    Lin, Hongfei
    Li, Yanpeng
    [J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2008, 32 (04) : 287 - 291
  • [9] A Hybrid Strategy to Protein Name Recognition
    Wang, Haochang
    Zhao, Tiejun
    [J]. 2008 7TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-23, 2008, : 627 - +
  • [10] NAME IN THE DICTIONARY
    Anpilov, Andrey
    [J]. NOVYI MIR, 2017, (10): : 3 - 8