A Persian Part-Of-Speech Tagger Based on Morphological Analysis

被引:0
|
作者
Mohseni, Mahdi [1 ]
Minaei-bidgoli, Behrouz [1 ]
机构
[1] Iran Univ Sci & Technol, Tehran, Iran
关键词
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
This paper describes a method based on morphological analysis of words for a Persian Part-Of-Speech (POS) tagging system. This is a main part of a process for expanding a large Persian corpus called Peyekare (or Textual Corpus of Persian Language). Peykare is arranged into two parts: annotated and unannotated parts. We use the annotated part in order to create an automatic morphological analyzer, a main segment of the system. Morphosyntactic features of Persian words cause two problems: the number of tags is increased in the corpus (586 tags) and the form of the words is changed. This high number of tags debilitates any taggers to work efficiently. From other side the change of word forms reduces the frequency of words with the same lemma; and the number of words belonging to a specific tag reduces as well. This problem also has a bad effect on statistical taggers. The morphological analyzer by removing the problems helps the tagger to cover a large number of tags in the corpus. Using a Markov tagger the method is evaluated on the corpus. The experiments show the efficiency of the method in Persian POS tagging.
引用
收藏
页码:1253 / 1257
页数:5
相关论文
共 50 条
  • [1] An Accurate Persian Part-of-Speech Tagger
    Okhovvat, Morteza
    Sharifi, Mohsen
    Bidgoli, Behrouz Minaei
    [J]. COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2020, 35 (06): : 423 - 430
  • [2] Adding Morphological Information to a Connectionist Part-Of-Speech Tagger
    Zamora-Martinez, Francisco
    Jose Castro-Bleda, Maria
    Espana-Boquera, Salvador
    Tortajada-Velert, Salvador
    [J]. CURRENT TOPICS IN ARTIFICIAL INTELLIGENCE, 2010, 5988 : 191 - +
  • [3] Tamil Part-of-Speech tagger based on SVMTool
    Dhanalakshmi, V
    Anandkumar, M.
    Vijaya, M. S.
    Loganathan, R.
    Soman, K. P.
    Rajendran, S.
    [J]. RECENT ADVANCES OF ASIAN LANGUAGE PROCESSING TECHNOLOGIES, 2008, : 59 - +
  • [4] A suffix based part-of-speech tagger for Turkish
    Dincer, Taner
    Karaoglan, Bahar
    Kisla, Tarik
    [J]. PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS, 2008, : 680 - +
  • [5] A FARSI PART-OF-SPEECH TAGGER BASED on MARKOV MODEL
    Mohseni, Mahdi
    Motalebi, Hasan
    Minaei-bidgoli, Behrouz
    Shokrollahi-far, Mahmoud
    [J]. APPLIED COMPUTING 2008, VOLS 1-3, 2008, : 1588 - +
  • [6] Implementing an efficient part-of-speech tagger
    Carlberger, J
    Kann, V
    [J]. SOFTWARE-PRACTICE & EXPERIENCE, 1999, 29 (09): : 815 - 832
  • [7] A Practical Part-of-Speech Tagger for Bengali
    Sarkar, Kamal
    Gayen, Vivekananda
    [J]. 2012 THIRD INTERNATIONAL CONFERENCE ON EMERGING APPLICATIONS OF INFORMATION TECHNOLOGY (EAIT), 2012, : 36 - 40
  • [8] An Efficient Part-of-Speech Tagger for Arabic
    Kopru, Selcuk
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PT I, 2011, 6608 : 202 - 213
  • [9] Part-of-Speech Tagger Based on Maximum Entropy Model
    Huang Heyan
    Zhang Xiaofei
    [J]. 2009 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, VOL 3, 2009, : 26 - 29
  • [10] TnT - A statistical part-of-speech tagger
    Brants, T
    [J]. 6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, 2000, : 224 - 231