Development of Automatic Rule-based Semantic Tagger and Karaka Analyzer for Hindi

被引:0
|
作者
Katyayan, Pragya [1 ]
Joshi, Nisheeth [1 ]
机构
[1] Banasthali Vidyapith, Dept Comp Sci, Vanasthali 304022, Rajasthan, India
关键词
Karaka analyzer; semantic tagging; feature extraction; language resource;
D O I
10.1145/3479155
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hindi is the third most-spoken language in the world (615 million speakers) and has the fourth highest native speakers (341 million). It is an inflectionally rich and relatively free word-order language with an immense vocabulary set. Despite being such a celebrated language across the globe, very few Natural Language Processing (NLP) applications and tools have been developed to support it computationally. Moreover, most of the existing ones are not efficient enough due to the lack of semantic information (or contextual knowledge). Hindi grammar is based on Paninian grammar and derives most of its rules from it. Paninian grammar very aggressively highlights the role of karaka theory in free-word order languages. In this article, we present an application that extracts all possible karakas from simple Hindi sentences with an accuracy of M.2% and an Fl score of 88.5%. We consider features such as Parts of Speech tags, post-position markers (vibhaktis), semantic tags for nouns and syntactic structure to grab the context in different-sized word windows within a sentence. With the help of these features, we built a rule-based inference engine to extract karakas from a sentence. The application takes in a text file with clean (without punctuation) simple Hindi sentences and gives back karaka tagged sentences in a separate text file as output.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] The linguistic basis of a rule-based tagger of Czech
    Oliva, K
    Hnátková, M
    Petkevic, V
    Kveton, P
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2000, 1902 : 3 - 8
  • [2] A rule-based tagger for Polish based on genetic algorithm
    Piasecki, M
    Gawel, B
    [J]. INTELLIGENT INFORMATION PROCESSING AND WEB MINING, PROCEEDINGS, 2005, : 247 - 255
  • [3] ReqTagger: A Rule-Based Tagger for Automatic Glossary of Terms Extraction from Ontology Requirements
    Wisniewski, Dawid
    Potoniec, Jedrzej
    Lawrynowicz, Agnieszka
    [J]. FOUNDATIONS OF COMPUTING AND DECISION SCIENCES, 2022, 47 (01) : 65 - 86
  • [4] HMM based POS tagger and rule-based chunker for Bengali
    Bandyopadhyay, Sivaji
    Ekbal, Asif
    [J]. PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION, 2007, : 384 - +
  • [5] Rule-Based Automatic Question Generation Using Semantic Role Labeling
    Keklik, Onur
    Tuglular, Tugkan
    Tekir, Selma
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (07) : 1362 - 1373
  • [6] Building an Indonesian Rule-Based Part-of-Speech Tagger
    Rashel, Fam
    Luthfi, Andry
    Dinakaramani, Arawinda
    Manurung, Ruli
    [J]. PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2014), 2014, : 70 - 73
  • [7] Boosting Statistical Tagger Accuracy with Simple Rule-Based Grammars
    Hulden, Mans
    Francom, Jerid
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2114 - 2117
  • [8] A Rule-based Morpho-semantic Analyzer of the Japanese Verb Phrases of Simple Sentences
    Alam, Yukiko Sasaki
    [J]. PACLIC 22: PROCEEDINGS OF THE 22ND PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, 2008, : 101 - 112
  • [9] A rule-based morpho-semantic analyzer of the Japanese verb phrases of simple sentences
    Alam, Yukiko Sasaki
    [J]. Proceedings of the 22nd Pacific Asia Conference on Language, Information and Computation, PACLIC 22, 2008, : 101 - 112
  • [10] A Rule-Based Morphosemantic Analyzer for French for a Fine-Grained Semantic Annotation of Texts
    Namer, Fiammetta
    [J]. SYSTEMS AND FRAMEWORKS FOR COMPUTATIONAL MORPHOLOGY, 2013, 380 : 92 - 114