An Accurate Persian Part-of-Speech Tagger

被引:0
|
作者
Okhovvat, Morteza [1 ]
Sharifi, Mohsen [2 ]
Bidgoli, Behrouz Minaei [2 ]
机构
[1] Golestan Univ Med Sci, Hlth Management & Social Dev Res Ctr, Gorgan, Golestan, Iran
[2] Iran Univ Sci & Technol, Sch Comp Engn, Tehran, Iran
来源
关键词
Hidden Markov Model; Maximum Likelihood Estimation; Morphology; POS Tagger; Viterbi Algorithm; LANGUAGE;
D O I
10.32604/csse.2020.35.423
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The processing of any natural language requires that the grammatical properties of every word in that language are tagged by a part of speech (POS) tagger. To present a more accurate POS tagger for the Persian language, we propose an improved and accurate tagger called IAoM that supports properties of text to speech systems such as Lexical Stress Search, Homograph words Disambiguation, Break Phrase Detection, and main aspects of Persian morphology. IAoM uses Maximum Likelihood Estimation (MLE) to determine the tags of unknown words. In addition, it uses a few defined rules for the sake of achieving high accuracy. For tagging the input corpus, IAoM uses a Hidden Markov Model (HMM) alongside the Viterbi algorithm. To present a fair evaluation, we have performed various experiments on both homogeneous and heterogeneous Persian corpora and studied the effect of the size of training set on the accuracy of IAoM. Experimental results demonstrate the merit of the proposed tagger in achieving an overall accuracy of 97.6%.
引用
收藏
页码:423 / 430
页数:8
相关论文
共 50 条
  • [21] Bayesian reinforcement for a probabilistic neural net Part-of-Speech tagger
    Maragoudakis, M
    Ganchev, T
    Fakotakis, N
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2004, 3206 : 137 - 145
  • [22] Development of a multilingual parallel corpus and a part-of-speech tagger for Afrikaans
    Trushkina, Julia
    [J]. Intelligent Information Processing III, 2006, 228 : 453 - 462
  • [23] A Supervised Part-Of-Speech Tagger for the Greek Language of the Social Web
    Nikiforos, Maria Nefeli
    Kermanidis, Katia Lida
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3861 - 3867
  • [24] Building an Indonesian Rule-Based Part-of-Speech Tagger
    Rashel, Fam
    Luthfi, Andry
    Dinakaramani, Arawinda
    Manurung, Ruli
    [J]. PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2014), 2014, : 70 - 73
  • [25] Arabic part-of-speech tagger based support vectors machines
    Yousif, Jabar Hassan
    Sembok, Tengku Mohd Tengku
    [J]. INTERNATIONAL SYMPOSIUM OF INFORMATION TECHNOLOGY 2008, VOLS 1-4, PROCEEDINGS: COGNITIVE INFORMATICS: BRIDGING NATURAL AND ARTIFICIAL KNOWLEDGE, 2008, : 2084 - +
  • [26] Choosing a Spanish Part-of-Speech tagger for a lexically sensitive task
    Escartin, Carla Parra
    Alonso, Hector Martinez
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2015, (54): : 29 - 36
  • [27] Detecting Syntactic Change Using a Neural Part-of-Speech Tagger
    Merrill, William
    Stark, Gigi Felice
    Frank, Robert
    [J]. 1ST INTERNATIONAL WORKSHOP ON COMPUTATIONAL APPROACHES TO HISTORICAL LANGUAGE CHANGE, 2019, : 167 - 174
  • [28] FarsiTag: A part-of-speech tagging system for Persian
    Rezai, Mohammad Javad
    Miangah, Tayebeh Mosavi
    [J]. DIGITAL SCHOLARSHIP IN THE HUMANITIES, 2017, 32 (03) : 632 - 642
  • [29] SoMeWeTa: A Part-of-Speech Tagger for German Social Media and Web Texts
    Proisl, Thomas
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 665 - 670
  • [30] Enriching the knowledge sources used in a maximum entropy part-of-speech tagger
    Toutanova, K
    Manning, CD
    [J]. PROCEEDINGS OF THE 2000 JOINT SIGDAT CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND VERY LARGE CORPORA, 2000, : 63 - 70