Development of Automatic Rule-based Semantic Tagger and Karaka Analyzer for Hindi

被引：0

作者：

Katyayan, Pragya ^{[1
]}

Joshi, Nisheeth ^{[1
]}

机构：

[1] Banasthali Vidyapith, Dept Comp Sci, Vanasthali 304022, Rajasthan, India

来源：

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING | 2022年 / 21卷 / 02期

关键词：

Karaka analyzer; semantic tagging; feature extraction; language resource;

D O I：

10.1145/3479155

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Hindi is the third most-spoken language in the world (615 million speakers) and has the fourth highest native speakers (341 million). It is an inflectionally rich and relatively free word-order language with an immense vocabulary set. Despite being such a celebrated language across the globe, very few Natural Language Processing (NLP) applications and tools have been developed to support it computationally. Moreover, most of the existing ones are not efficient enough due to the lack of semantic information (or contextual knowledge). Hindi grammar is based on Paninian grammar and derives most of its rules from it. Paninian grammar very aggressively highlights the role of karaka theory in free-word order languages. In this article, we present an application that extracts all possible karakas from simple Hindi sentences with an accuracy of M.2% and an Fl score of 88.5%. We consider features such as Parts of Speech tags, post-position markers (vibhaktis), semantic tags for nouns and syntactic structure to grab the context in different-sized word windows within a sentence. With the help of these features, we built a rule-based inference engine to extract karakas from a sentence. The application takes in a text file with clean (without punctuation) simple Hindi sentences and gives back karaka tagged sentences in a separate text file as output.

引用

页数：25

共 50 条

[1] The linguistic basis of a rule-based tagger of Czech
Oliva, K
Hnátková, M
Petkevic, V
Kveton, P
[J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2000, 1902 : 3 - 8
[2] A rule-based tagger for Polish based on genetic algorithm
Piasecki, M
Gawel, B
[J]. INTELLIGENT INFORMATION PROCESSING AND WEB MINING, PROCEEDINGS, 2005, : 247 - 255
[3] ReqTagger: A Rule-Based Tagger for Automatic Glossary of Terms Extraction from Ontology Requirements
Wisniewski, Dawid
Potoniec, Jedrzej
Lawrynowicz, Agnieszka
[J]. FOUNDATIONS OF COMPUTING AND DECISION SCIENCES, 2022, 47 (01) : 65 - 86
[4] HMM based POS tagger and rule-based chunker for Bengali
Bandyopadhyay, Sivaji
Ekbal, Asif
[J]. PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION, 2007, : 384 - +
[5] Rule-Based Automatic Question Generation Using Semantic Role Labeling
Keklik, Onur
Tuglular, Tugkan
Tekir, Selma
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (07) : 1362 - 1373
[6] Building an Indonesian Rule-Based Part-of-Speech Tagger
Rashel, Fam
Luthfi, Andry
Dinakaramani, Arawinda
Manurung, Ruli
[J]. PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2014), 2014, : 70 - 73
[7] Boosting Statistical Tagger Accuracy with Simple Rule-Based Grammars
Hulden, Mans
Francom, Jerid
[J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2114 - 2117
[8] A Rule-based Morpho-semantic Analyzer of the Japanese Verb Phrases of Simple Sentences
Alam, Yukiko Sasaki
[J]. PACLIC 22: PROCEEDINGS OF THE 22ND PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, 2008, : 101 - 112
[9] A rule-based morpho-semantic analyzer of the Japanese verb phrases of simple sentences
Alam, Yukiko Sasaki
[J]. Proceedings of the 22nd Pacific Asia Conference on Language, Information and Computation, PACLIC 22, 2008, : 101 - 112
[10] A Rule-Based Morphosemantic Analyzer for French for a Fine-Grained Semantic Annotation of Texts
Namer, Fiammetta
[J]. SYSTEMS AND FRAMEWORKS FOR COMPUTATIONAL MORPHOLOGY, 2013, 380 : 92 - 114

← 1 2 3 4 5 →