Combining terminology resources and statistical methods for entity recognition: an evaluation

被引:0
|
作者
Roberts, Angus [1 ]
Gaizauskas, Robert [1 ]
Hepple, Mark [1 ]
Guo, Yikun [1 ]
机构
[1] Univ Sheffield, Dept Comp Sci, Sheffield S1 4DP, S Yorkshire, England
关键词
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Terminologies and other knowledge resources are widely used to aid entity recognition in specialist domain texts. As well as providing lexicons of specialist terms, linkage from the text back to a resource can make additional knowledge available to applications. Use of such resources is especially pertinent in the biomedical domain, where large numbers of these resources are available, and where they are widely used in informatics applications. Terminology resources can be most readily used by simple lexical lookup of terms in the text. A major drawback with such lexical lookup, however, is poor precision caused by ambiguity between domain terms and general language words. We combine lexical lookup with simple filtering of ambiguous terms, to improve precision. We compare this lexical lookup with a statistical method of entity recognition, and to a method which combines the two approaches. We show that the combined method boosts precision with little loss of recall, and that linkage from recognised entities back to the domain knowledge resources can be maintained.
引用
收藏
页码:2974 / 2980
页数:7
相关论文
共 50 条
  • [1] Chemical entity recognition in patents by combining dictionary-based and statistical approaches
    Akhondi, Saber A.
    Pons, Ewoud
    Afzal, Zubair
    van Haagen, Herman
    Becker, Benedikt F. H.
    Hettne, Kristina M.
    van Mulligen, Erik M.
    Kors, Jan A.
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2016,
  • [2] Statistical dataset evaluation: A case study on named entity recognition
    Wang, Chengwen
    Dong, Qingxiu
    Wang, Xiaochen
    Sui, Zhifang
    NATURAL LANGUAGE PROCESSING, 2025, 31 (01): : 90 - 110
  • [3] An evaluation of statistical methods in handwritten hangul recognition
    Park, Gyu-Ro
    Kim, In-Jung
    Liu, Cheng-Lin
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2013, 16 (03) : 273 - 283
  • [4] An evaluation of statistical methods in handwritten hangul recognition
    Gyu-Ro Park
    In-Jung Kim
    Cheng-Lin Liu
    International Journal on Document Analysis and Recognition (IJDAR), 2013, 16 : 273 - 283
  • [5] IsiXhosa Named Entity Recognition Resources
    Eiselen, Roald
    Bukula, Andiswa
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (02)
  • [6] Combining word-based features, statistical language models, and parsing for named entity recognition
    Polifroni, Joseph
    Seneff, Stephanie
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1289 - +
  • [7] Combining rule-based and statistical mechanisms for low-resource named entity recognition
    Gabbard, Ryan
    DeYoung, Jay
    Lignos, Constantine
    Freedman, Marjorie
    Weischedel, Ralph
    MACHINE TRANSLATION, 2018, 32 (1-2) : 31 - 43
  • [8] Combining Contextualized Embeddings and Prior Knowledge for Clinical Named Entity Recognition: Evaluation Study
    Jiang, Min
    Sanger, Todd
    Liu, Xiong
    JMIR MEDICAL INFORMATICS, 2019, 7 (04) : 80 - 94
  • [9] Recognition of on-line cursive Korean characters combining statistical and structural methods
    Kwon, JO
    Sin, B
    Kim, JH
    PATTERN RECOGNITION, 1997, 30 (08) : 1255 - 1263
  • [10] Terminology access methods leveraging LDAP resources
    Solbrig, HR
    Chute, CG
    MEDINFO 2004: PROCEEDINGS OF THE 11TH WORLD CONGRESS ON MEDICAL INFORMATICS, PT 1 AND 2, 2004, 107 : 545 - 549