NAMED-ENTITY RECOGNITION FOR HINDI LANGUAGE USING CONTEXT PATTERN-BASED MAXIMUM ENTROPY

被引:2
|
作者
Jain, Arti [1 ]
Yadav, Divakar [2 ]
Arora, Anuja [1 ]
Tayal, Devendra K. [3 ]
机构
[1] Jaypee Inst Informat Technol, Noida, Uttar Pradesh, India
[2] NIT Hamirpur, Hamirpur, Himachal Prades, India
[3] Indira Gandhi Delhi Tech Univ Women, New Delhi, India
来源
COMPUTER SCIENCE-AGH | 2022年 / 23卷 / 01期
关键词
context patterns; gazetteer lists; Hindi language; Kaggle dataset; maximum entropy; named-entity recognition; feature extension; HYBRID APPROACH; SYSTEM;
D O I
10.7494/csci.2022.23.1.3977
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper describes a named-entity-recognition (NER) system for the Hindi language that uses two methodologies: an existing baseline maximum entropy-based named-entity (BL-MENE) model, and the proposed context pattern-based MENE (CP-MENE) framework. BL-MENE utilizes several baseline features for the NER task but suffers from inaccurate named-entity (NE) boundary detection, misclassification errors, and the partial recognition of NEs due to certain missing essentials. However, the CP-MENE-based NER task incorporates extensive features and patterns that are set to overcome these problems. In fact, CP-MENE's features include right-boundary, left-boundary, part-of-speech, synonym, gazetteer and relative pronoun features. CP-MENE formulates a kind of recursive relationship for extracting highly ranked NE patterns that are generated through regular expressions via Python (C) code. Since the web content of the Hindi language is arising nowadays (especially in health care applications), this work is conducted on the Hindi health data (HHD) corpus (which is readily available from the Kaggle dataset). Our experiments were conducted on four NE categories; namely, Person (PER), Disease (DIS), Consumable (CNS), and Symptom (SMP).
引用
收藏
页码:81 / 115
页数:35
相关论文
共 50 条
  • [1] A Semi-supervised Approach for Maximum Entropy Based Hindi Named Entity Recognition
    Saha, Sujan Kumar
    Mitra, Pabitra
    Sarkar, Sudeshna
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2009, 5909 : 225 - 230
  • [2] Maximum Entropy Named Entity Recognition for Czech Language
    Konkol, Michal
    Konopik, Miloslav
    TEXT, SPEECH AND DIALOGUE, TSD 2011, 2011, 6836 : 203 - 210
  • [3] Named entity recognition by using maximum entropy
    SCSE, VIT University, Vellore, India
    Int. J. Database Theory Appl., 2 (43-50):
  • [4] Named entity recognition for Hindi language : A survey
    Sharma, Richa
    Morwal, Sudha
    Agarwal, Basant
    JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY, 2019, 22 (04): : 569 - 580
  • [5] NERSIL: The named-entity recognition system for Iban language
    Yong, Soo-Fong
    Ranaivo-Malançon, Bali
    Wee, Alvin Yeo
    PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation, 2011, : 549 - 558
  • [6] Named entity recognition using neural language model and CRF for Hindi language
    Sharma, Richa
    Morwal, Sudha
    Agarwal, Basant
    COMPUTER SPEECH AND LANGUAGE, 2022, 74
  • [7] Document Theme Extraction Using Named-Entity Recognition
    Nagrale, Deepali
    Khatavkar, Vaibhav
    Kulkarni, Parag
    COMPUTING, COMMUNICATION AND SIGNAL PROCESSING, ICCASP 2018, 2019, 810 : 499 - 509
  • [8] Curatable Named-Entity Recognition Using Semantic Relations
    Hsu, Yi-Yu
    Kao, Hung-Yu
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2015, 12 (04) : 785 - 792
  • [9] ANERsys: An Arabic Named Entity Recognition system based on maximum entropy
    Benajiba, Yassine
    Rosso, Paolo
    Ruiz, Jose Miguel Benedi
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2007, 4394 : 143 - +
  • [10] Method of Chinese Named Entity Recognition Based on Maximum Entropy Model
    Ning Hui
    Yang Hua
    Tan Ya-zhou
    Wu Hao
    2009 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION, VOLS 1-7, CONFERENCE PROCEEDINGS, 2009, : 2472 - 2477