NAMED-ENTITY RECOGNITION FOR HINDI LANGUAGE USING CONTEXT PATTERN-BASED MAXIMUM ENTROPY

被引:2
|
作者
Jain, Arti [1 ]
Yadav, Divakar [2 ]
Arora, Anuja [1 ]
Tayal, Devendra K. [3 ]
机构
[1] Jaypee Inst Informat Technol, Noida, Uttar Pradesh, India
[2] NIT Hamirpur, Hamirpur, Himachal Prades, India
[3] Indira Gandhi Delhi Tech Univ Women, New Delhi, India
来源
COMPUTER SCIENCE-AGH | 2022年 / 23卷 / 01期
关键词
context patterns; gazetteer lists; Hindi language; Kaggle dataset; maximum entropy; named-entity recognition; feature extension; HYBRID APPROACH; SYSTEM;
D O I
10.7494/csci.2022.23.1.3977
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper describes a named-entity-recognition (NER) system for the Hindi language that uses two methodologies: an existing baseline maximum entropy-based named-entity (BL-MENE) model, and the proposed context pattern-based MENE (CP-MENE) framework. BL-MENE utilizes several baseline features for the NER task but suffers from inaccurate named-entity (NE) boundary detection, misclassification errors, and the partial recognition of NEs due to certain missing essentials. However, the CP-MENE-based NER task incorporates extensive features and patterns that are set to overcome these problems. In fact, CP-MENE's features include right-boundary, left-boundary, part-of-speech, synonym, gazetteer and relative pronoun features. CP-MENE formulates a kind of recursive relationship for extracting highly ranked NE patterns that are generated through regular expressions via Python (C) code. Since the web content of the Hindi language is arising nowadays (especially in health care applications), this work is conducted on the Hindi health data (HHD) corpus (which is readily available from the Kaggle dataset). Our experiments were conducted on four NE categories; namely, Person (PER), Disease (DIS), Consumable (CNS), and Symptom (SMP).
引用
收藏
页码:81 / 115
页数:35
相关论文
共 50 条
  • [41] A Benchmark Evaluation of Multilingual Large Language Models for Arabic Cross-Lingual Named-Entity Recognition
    Al-Duwais, Mashael
    Al-Khalifa, Hend
    Al-Salman, Abdulmalik
    ELECTRONICS, 2024, 13 (17)
  • [42] BERT-Based Transfer-Learning Approach for Nested Named-Entity Recognition Using Joint Labeling
    Agrawal, Ankit
    Tripathi, Sarsij
    Vardhan, Manu
    Sihag, Vikas
    Choudhary, Gaurav
    Dragoni, Nicola
    APPLIED SCIENCES-BASEL, 2022, 12 (03):
  • [43] Named entity extraction based on a maximum entropy model and transformation rules
    Uchimoto, K
    Ma, Q
    Murata, M
    Ozaku, H
    Isahara, H
    38TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2000, : 326 - 335
  • [44] Using lexical disambiguation and named-entity recognition to improve spelling correction in the electronic patient record
    Ruch, P
    Baud, R
    Geissbühler, A
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2003, 29 (1-2) : 169 - 184
  • [45] Named-entity recognition for the diagnosis and treatment of aquatic animal diseases using knowledge graph construction
    Jusheng L.
    Huining Y.
    Zhetao S.
    He Y.
    Liming S.
    Hong Y.
    Sijia Z.
    Shigen Y.
    Nongye Gongcheng Xuebao/Transactions of the Chinese Society of Agricultural Engineering, 2022, 38 (07): : 210 - 217
  • [46] A rule-based named-entity recognition method for knowledge extraction of evidence based dietary recommendations
    Eftimov, Tome
    Seljak, Barbara Korousic
    Korosec, Peter
    PLOS ONE, 2017, 12 (06):
  • [47] Named-Entity Recognition in Sports Field Based on a Character-Level Graph Convolutional Network
    Seti, Xieraili
    Wumaier, Aishan
    Yibulayin, Turgen
    Paerhati, Diliyaer
    Wang, Lulu
    Saimaiti, Alimu
    INFORMATION, 2020, 11 (01)
  • [48] Using Search Session Context for Named Entity Recognition in Query
    Du, Junwu
    Zhang, Zhimin
    Yan, Jun
    Cui, Yan
    Chen, Zheng
    SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 765 - 766
  • [49] A Robust Named-Entity Recognition System Using Syllable Bigram Embedding with Eojeol Prefix Information
    Kwon, Sunjae
    Ko, Youngjoong
    Seo, Jungyun
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 2139 - 2142
  • [50] CRF-Based Named Entity Recognition for Myanmar Language
    Mo, Hsu Myat
    Nwet, Khin Thandar
    Soe, Khin Mar
    GENETIC AND EVOLUTIONARY COMPUTING, 2017, 536 : 204 - 211