NAMED-ENTITY RECOGNITION FOR HINDI LANGUAGE USING CONTEXT PATTERN-BASED MAXIMUM ENTROPY

被引:2
|
作者
Jain, Arti [1 ]
Yadav, Divakar [2 ]
Arora, Anuja [1 ]
Tayal, Devendra K. [3 ]
机构
[1] Jaypee Inst Informat Technol, Noida, Uttar Pradesh, India
[2] NIT Hamirpur, Hamirpur, Himachal Prades, India
[3] Indira Gandhi Delhi Tech Univ Women, New Delhi, India
来源
COMPUTER SCIENCE-AGH | 2022年 / 23卷 / 01期
关键词
context patterns; gazetteer lists; Hindi language; Kaggle dataset; maximum entropy; named-entity recognition; feature extension; HYBRID APPROACH; SYSTEM;
D O I
10.7494/csci.2022.23.1.3977
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper describes a named-entity-recognition (NER) system for the Hindi language that uses two methodologies: an existing baseline maximum entropy-based named-entity (BL-MENE) model, and the proposed context pattern-based MENE (CP-MENE) framework. BL-MENE utilizes several baseline features for the NER task but suffers from inaccurate named-entity (NE) boundary detection, misclassification errors, and the partial recognition of NEs due to certain missing essentials. However, the CP-MENE-based NER task incorporates extensive features and patterns that are set to overcome these problems. In fact, CP-MENE's features include right-boundary, left-boundary, part-of-speech, synonym, gazetteer and relative pronoun features. CP-MENE formulates a kind of recursive relationship for extracting highly ranked NE patterns that are generated through regular expressions via Python (C) code. Since the web content of the Hindi language is arising nowadays (especially in health care applications), this work is conducted on the Hindi health data (HHD) corpus (which is readily available from the Kaggle dataset). Our experiments were conducted on four NE categories; namely, Person (PER), Disease (DIS), Consumable (CNS), and Symptom (SMP).
引用
收藏
页码:81 / 115
页数:35
相关论文
共 50 条
  • [31] Indonesian Named-entity Recognition for 15 Classes Using Ensemble Supervised Learning
    Wibawa, Aditya Satrya
    Purwarianti, Ayu
    SLTU-2016 5TH WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGIES FOR UNDER-RESOURCED LANGUAGES, 2016, 81 : 221 - 228
  • [32] Maximum Entropy-Based Named Entity Recognition Method for Multiple Social Networking Services
    Jung, Jason J.
    JOURNAL OF INTERNET TECHNOLOGY, 2012, 13 (06): : 931 - 937
  • [33] Syllabification Model of Indonesian Language Named-Entity Using Syntactic n-Gram
    Fanani, Ahmad Muammar
    Suyanto, Suyanto
    5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE 2020, 2021, 179 : 721 - 727
  • [34] Pattern based bootstrapping method for named entity recognition
    Ekbal, Asif
    Bandyopadhyay, Sivaji
    PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION, 2007, : 349 - +
  • [35] Named Entity Recognition in Bengali and Hindi Using MuRIL and Conditional Random Fields
    Kaushik Bose
    Kamal Sarkar
    SN Computer Science, 5 (7)
  • [36] A Comparative Study of Named Entity Recognition for Hindi Using Sequential Learning Algorithms
    Krishnarao, Awaghad Ashish
    Gahlot, Himanshu
    Srinet, Amit
    Kushwaha, D. S.
    2009 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE, VOLS 1-3, 2009, : 1163 - 1168
  • [37] FoodIE: A Rule-based Named-entity Recognition Method for Food Information Extraction
    Popovski, Gorjan
    Kochev, Stefan
    Seljak, Barbara Korousic
    Eftimov, Tome
    ICPRAM: PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS, 2019, : 915 - 922
  • [38] Bacterial Named Entity Recognition Based on Language Model
    Li, Xusheng
    Fu, Chengcheng
    Zhong, Ran
    Zhong, Duo
    He, Tingling
    Jiang, Xingpeng
    2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 2715 - 2721
  • [39] HMM based Named Entity Recognition for Inflectional Language
    Patil, Nita V.
    Patil, Ajay S.
    Pawar, B. V.
    2017 INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATIONS AND ELECTRONICS (COMPTELIX), 2017, : 565 - 572
  • [40] CALM: Context Augmentation with Large Language Model for Named Entity Recognition
    Luiggi, Tristan
    Herserant, Tanguy
    Trani, Thong
    Soulier, Laure
    Guigue, Vincent
    LINKING THEORY AND PRACTICE OF DIGITAL LIBRARIES, PT I, TPDL 2024, 2024, 15177 : 273 - 291