An Accurate Persian Part-of-Speech Tagger

被引:0
|
作者
Okhovvat, Morteza [1 ]
Sharifi, Mohsen [2 ]
Bidgoli, Behrouz Minaei [2 ]
机构
[1] Golestan Univ Med Sci, Hlth Management & Social Dev Res Ctr, Gorgan, Golestan, Iran
[2] Iran Univ Sci & Technol, Sch Comp Engn, Tehran, Iran
来源
关键词
Hidden Markov Model; Maximum Likelihood Estimation; Morphology; POS Tagger; Viterbi Algorithm; LANGUAGE;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The processing of any natural language requires that the grammatical properties of every word in that language are tagged by a part of speech (POS) tagger. To present a more accurate POS tagger for the Persian language, we propose an improved and accurate tagger called IAoM that supports properties of text to speech systems such as Lexical Stress Search, Homograph words Disambiguation, Break Phrase Detection, and main aspects of Persian morphology. IAoM uses Maximum Likelihood Estimation (MLE) to determine the tags of unknown words. In addition, it uses a few defined rules for the sake of achieving high accuracy. For tagging the input corpus, IAoM uses a Hidden Markov Model (HMM) alongside the Viterbi algorithm. To present a fair evaluation, we have performed various experiments on both homogeneous and heterogeneous Persian corpora and studied the effect of the size of training set on the accuracy of IAoM. Experimental results demonstrate the merit of the proposed tagger in achieving an overall accuracy of 97.6%.
引用
收藏
页码:423 / 430
页数:8
相关论文
共 50 条
  • [1] An accurate Persian part-of-speech tagger
    Okhovvat, Morteza
    Sharifi, Mohsen
    Bidgoli, Behrouz Minaei
    [J]. Computer Systems Science and Engineering, 2020, 35 (06): : 423 - 430
  • [2] A Persian Part-Of-Speech Tagger Based on Morphological Analysis
    Mohseni, Mahdi
    Minaei-bidgoli, Behrouz
    [J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 1253 - 1257
  • [3] Implementing an efficient part-of-speech tagger
    Carlberger, J
    Kann, V
    [J]. SOFTWARE-PRACTICE & EXPERIENCE, 1999, 29 (09): : 815 - 832
  • [4] A Practical Part-of-Speech Tagger for Bengali
    Sarkar, Kamal
    Gayen, Vivekananda
    [J]. 2012 THIRD INTERNATIONAL CONFERENCE ON EMERGING APPLICATIONS OF INFORMATION TECHNOLOGY (EAIT), 2012, : 36 - 40
  • [5] An Efficient Part-of-Speech Tagger for Arabic
    Kopru, Selcuk
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PT I, 2011, 6608 : 202 - 213
  • [6] TnT - A statistical part-of-speech tagger
    Brants, T
    [J]. 6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, 2000, : 224 - 231
  • [7] Tamil Part-of-Speech tagger based on SVMTool
    Dhanalakshmi, V
    Anandkumar, M.
    Vijaya, M. S.
    Loganathan, R.
    Soman, K. P.
    Rajendran, S.
    [J]. RECENT ADVANCES OF ASIAN LANGUAGE PROCESSING TECHNOLOGIES, 2008, : 59 - +
  • [8] Toward an Effective Igbo Part-of-Speech Tagger
    Onyenwe, Ikechukwu E.
    Hepple, Mark
    Chinedu, Uchechukwu
    Ezeani, Ignatius
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2019, 18 (04)
  • [9] A suffix based part-of-speech tagger for Turkish
    Dincer, Taner
    Karaoglan, Bahar
    Kisla, Tarik
    [J]. PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS, 2008, : 680 - +
  • [10] MedPost: a part-of-speech tagger for bioMedical text
    Smith, L
    Rindflesch, T
    Wilbur, WJ
    [J]. BIOINFORMATICS, 2004, 20 (14) : 2320 - 2321