NAMED-ENTITY RECOGNITION FOR HINDI LANGUAGE USING CONTEXT PATTERN-BASED MAXIMUM ENTROPY

被引:2
|
作者
Jain, Arti [1 ]
Yadav, Divakar [2 ]
Arora, Anuja [1 ]
Tayal, Devendra K. [3 ]
机构
[1] Jaypee Inst Informat Technol, Noida, Uttar Pradesh, India
[2] NIT Hamirpur, Hamirpur, Himachal Prades, India
[3] Indira Gandhi Delhi Tech Univ Women, New Delhi, India
来源
COMPUTER SCIENCE-AGH | 2022年 / 23卷 / 01期
关键词
context patterns; gazetteer lists; Hindi language; Kaggle dataset; maximum entropy; named-entity recognition; feature extension; HYBRID APPROACH; SYSTEM;
D O I
10.7494/csci.2022.23.1.3977
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper describes a named-entity-recognition (NER) system for the Hindi language that uses two methodologies: an existing baseline maximum entropy-based named-entity (BL-MENE) model, and the proposed context pattern-based MENE (CP-MENE) framework. BL-MENE utilizes several baseline features for the NER task but suffers from inaccurate named-entity (NE) boundary detection, misclassification errors, and the partial recognition of NEs due to certain missing essentials. However, the CP-MENE-based NER task incorporates extensive features and patterns that are set to overcome these problems. In fact, CP-MENE's features include right-boundary, left-boundary, part-of-speech, synonym, gazetteer and relative pronoun features. CP-MENE formulates a kind of recursive relationship for extracting highly ranked NE patterns that are generated through regular expressions via Python (C) code. Since the web content of the Hindi language is arising nowadays (especially in health care applications), this work is conducted on the Hindi health data (HHD) corpus (which is readily available from the Kaggle dataset). Our experiments were conducted on four NE categories; namely, Person (PER), Disease (DIS), Consumable (CNS), and Symptom (SMP).
引用
收藏
页码:81 / 115
页数:35
相关论文
共 50 条
  • [21] Improving feature extraction in named entity recognition based on maximum entropy model
    Jiang, Wei
    Guan, Yi
    Wang, Xiao-Long
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 2630 - +
  • [22] Named Entity Recognition in Hindi Using Hidden Markov Model
    Chopra, Deepti
    Joshi, Nisheeth
    Mathur, Iti
    2016 SECOND INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE & COMMUNICATION TECHNOLOGY (CICT), 2016, : 581 - 586
  • [23] Thai Named-Entity Recognition Using Class-based Language Modeling on Multiple-sized Subword Units
    Saykhum, Kwanchiva
    Boonpiam, Vataya
    Thatphithakkul, Nattanun
    Wutiwiwatchai, Chai
    Natthee, Cholwich
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1586 - +
  • [24] GoalBERT: A Lightweight Named-Entity Recognition Model Based on Multiple Fusion
    Xu, Yingjie
    Tan, Xiaobo
    Wang, Mengxuan
    Zhang, Wenbo
    APPLIED SCIENCES-BASEL, 2024, 14 (23):
  • [25] Ontology Extraction from Software Requirements Using Named-Entity Recognition
    Kocerka, Jerzy
    Krzeslak, Michal
    Galuszka, Adam
    ADVANCES IN SCIENCE AND TECHNOLOGY-RESEARCH JOURNAL, 2022, 16 (03) : 207 - 212
  • [26] Knowledge-Augmented Language Model and Its Application to Unsupervised Named-Entity Recognition
    Liu, Angli
    Du, Jingfei
    Stoyanov, Veselin
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 1142 - 1150
  • [27] Using machine learning to maintain rule-based named-entity recognition and classification systems
    Petasis, G
    Vichot, F
    Wolinski, F
    Paliouras, G
    Karkaletsis, V
    Spyropoulos, CD
    39TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2001, : 418 - 425
  • [28] Named-Entity Recognition on Indonesian Tweets using Bidirectional LSTM-CRF
    Wintaka, Deni Cahya
    Bijaksana, Moch Arif
    Asror, Ibnu
    4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE (ICCSCI 2019) : ENABLING COLLABORATION TO ESCALATE IMPACT OF RESEARCH RESULTS FOR SOCIETY, 2019, 157 : 221 - 228
  • [29] Named entity recognition in Bengali and Hindi using support vector machine
    Ekbal, Asif
    Bandyopadhyay, Sivaji
    LINGUISTICAE INVESTIGATIONES, 2011, 34 (01): : 35 - 67
  • [30] ABioNER: A BERT-Based Model for Arabic Biomedical Named-Entity Recognition
    Boudjellal, Nada
    Zhang, Huaping
    Khan, Asif
    Ahmad, Arshad
    Naseem, Rashid
    Shang, Jianyun
    Dai, Lin
    COMPLEXITY, 2021, 2021