An Efficient Part-of-Speech Tagger for Arabic

被引:0
|
作者
Kopru, Selcuk [1 ]
机构
[1] Teknol Yazilimevi Ltd, METU Technopolis, TR-06531 Ankara, Turkey
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present an efficient part-of-speech (POS) tagger for Arabic which is based on a Hidden Markow Model. We explore different enhancements to improve the baseline system. Despite the morphological complexity of Arabic our approach is a data driven approach and does not utilize any morphological analyzer or a lexicon as many other Arabic PUS taggers. This makes our approach simple, very efficient and valuable to be used in real-life applications and the obtained accuracy results are still comparable to other Arabic POS taggers. In the experiments, we also thoroughly investigate different aspects of Arabic PUS tagging including tag sets, prefix and suffix analyses which were not examined in detail before. Our part-of-speech tagger achieves an accuracy of 95.57% on a standard tagset for Arabic. A detailed error analysis is provided for a better evaluation of the system. We also applied the same approach on different languages like Farsi and German to show the language independent aspect of the approach. Accuracy rates on these languages are also provided.
引用
收藏
页码:202 / 213
页数:12
相关论文
共 50 条
  • [1] Implementing an efficient part-of-speech tagger
    Carlberger, J
    Kann, V
    [J]. SOFTWARE-PRACTICE & EXPERIENCE, 1999, 29 (09): : 815 - 832
  • [2] Arabic part-of-speech tagger based support vectors machines
    Yousif, Jabar Hassan
    Sembok, Tengku Mohd Tengku
    [J]. INTERNATIONAL SYMPOSIUM OF INFORMATION TECHNOLOGY 2008, VOLS 1-4, PROCEEDINGS: COGNITIVE INFORMATICS: BRIDGING NATURAL AND ARTIFICIAL KNOWLEDGE, 2008, : 2084 - +
  • [3] Toward An Efficient Arabic Part of Speech Tagger
    Abdelali, Ahmed
    Elhadj, Yahya O. Mohamed
    Bouziane, Rachid
    [J]. 2013 ACS INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2013,
  • [4] An Accurate Persian Part-of-Speech Tagger
    Okhovvat, Morteza
    Sharifi, Mohsen
    Bidgoli, Behrouz Minaei
    [J]. COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2020, 35 (06): : 423 - 430
  • [5] A Practical Part-of-Speech Tagger for Bengali
    Sarkar, Kamal
    Gayen, Vivekananda
    [J]. 2012 THIRD INTERNATIONAL CONFERENCE ON EMERGING APPLICATIONS OF INFORMATION TECHNOLOGY (EAIT), 2012, : 36 - 40
  • [6] Fine-Grain Morphological Analyzer and Part-of-Speech Tagger for Arabic Text
    Sawalha, Majdi
    Atwell, Eric
    [J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 1258 - 1265
  • [7] An accurate Persian part-of-speech tagger
    Okhovvat, Morteza
    Sharifi, Mohsen
    Bidgoli, Behrouz Minaei
    [J]. Computer Systems Science and Engineering, 2020, 35 (06): : 423 - 430
  • [8] TnT - A statistical part-of-speech tagger
    Brants, T
    [J]. 6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, 2000, : 224 - 231
  • [9] Tamil Part-of-Speech tagger based on SVMTool
    Dhanalakshmi, V
    Anandkumar, M.
    Vijaya, M. S.
    Loganathan, R.
    Soman, K. P.
    Rajendran, S.
    [J]. RECENT ADVANCES OF ASIAN LANGUAGE PROCESSING TECHNOLOGIES, 2008, : 59 - +
  • [10] Toward an Effective Igbo Part-of-Speech Tagger
    Onyenwe, Ikechukwu E.
    Hepple, Mark
    Chinedu, Uchechukwu
    Ezeani, Ignatius
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2019, 18 (04)