NAMED-ENTITY RECOGNITION FOR HINDI LANGUAGE USING CONTEXT PATTERN-BASED MAXIMUM ENTROPY

被引：2

作者：

Jain, Arti ^{[1
]}

Yadav, Divakar ^{[2
]}

Arora, Anuja ^{[1
]}

Tayal, Devendra K. ^{[3
]}

机构：

[1] Jaypee Inst Informat Technol, Noida, Uttar Pradesh, India

[2] NIT Hamirpur, Hamirpur, Himachal Prades, India

[3] Indira Gandhi Delhi Tech Univ Women, New Delhi, India

来源：

COMPUTER SCIENCE-AGH | 2022年 / 23卷 / 01期

关键词：

context patterns; gazetteer lists; Hindi language; Kaggle dataset; maximum entropy; named-entity recognition; feature extension; HYBRID APPROACH; SYSTEM;

D O I：

10.7494/csci.2022.23.1.3977

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

This paper describes a named-entity-recognition (NER) system for the Hindi language that uses two methodologies: an existing baseline maximum entropy-based named-entity (BL-MENE) model, and the proposed context pattern-based MENE (CP-MENE) framework. BL-MENE utilizes several baseline features for the NER task but suffers from inaccurate named-entity (NE) boundary detection, misclassification errors, and the partial recognition of NEs due to certain missing essentials. However, the CP-MENE-based NER task incorporates extensive features and patterns that are set to overcome these problems. In fact, CP-MENE's features include right-boundary, left-boundary, part-of-speech, synonym, gazetteer and relative pronoun features. CP-MENE formulates a kind of recursive relationship for extracting highly ranked NE patterns that are generated through regular expressions via Python (C) code. Since the web content of the Hindi language is arising nowadays (especially in health care applications), this work is conducted on the Hindi health data (HHD) corpus (which is readily available from the Kaggle dataset). Our experiments were conducted on four NE categories; namely, Person (PER), Disease (DIS), Consumable (CNS), and Symptom (SMP).

引用

页码：81 / 115

页数：35

共 50 条

[1] A Semi-supervised Approach for Maximum Entropy Based Hindi Named Entity Recognition
Saha, Sujan Kumar
Mitra, Pabitra
Sarkar, Sudeshna
PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2009, 5909 : 225 - 230
[2] Maximum Entropy Named Entity Recognition for Czech Language
Konkol, Michal
Konopik, Miloslav
TEXT, SPEECH AND DIALOGUE, TSD 2011, 2011, 6836 : 203 - 210
[3] Named entity recognition by using maximum entropy
SCSE, VIT University, Vellore, India
Int. J. Database Theory Appl., 2 (43-50):
[4] Named entity recognition for Hindi language : A survey
Sharma, Richa
Morwal, Sudha
Agarwal, Basant
JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY, 2019, 22 (04): : 569 - 580
[5] NERSIL: The named-entity recognition system for Iban language
Yong, Soo-Fong
Ranaivo-Malançon, Bali
Wee, Alvin Yeo
PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation, 2011, : 549 - 558
[6] Named entity recognition using neural language model and CRF for Hindi language
Sharma, Richa
Morwal, Sudha
Agarwal, Basant
COMPUTER SPEECH AND LANGUAGE, 2022, 74
[7] Document Theme Extraction Using Named-Entity Recognition
Nagrale, Deepali
Khatavkar, Vaibhav
Kulkarni, Parag
COMPUTING, COMMUNICATION AND SIGNAL PROCESSING, ICCASP 2018, 2019, 810 : 499 - 509
[8] Curatable Named-Entity Recognition Using Semantic Relations
Hsu, Yi-Yu
Kao, Hung-Yu
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2015, 12 (04) : 785 - 792
[9] ANERsys: An Arabic Named Entity Recognition system based on maximum entropy
Benajiba, Yassine
Rosso, Paolo
Ruiz, Jose Miguel Benedi
COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2007, 4394 : 143 - +
[10] Method of Chinese Named Entity Recognition Based on Maximum Entropy Model
Ning Hui
Yang Hua
Tan Ya-zhou
Wu Hao
2009 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION, VOLS 1-7, CONFERENCE PROCEEDINGS, 2009, : 2472 - 2477

← 1 2 3 4 5 →