Arabic Part of Speech Tagging

被引:0
|
作者
Mohamed, Emad [1 ]
Kuebler, Sandra [1 ]
机构
[1] Indiana Univ, Dept Linguist, Bloomington, IN 47405 USA
关键词
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
Arabic is a morphologically rich language, which presents a challenge for part of speech tagging. In this paper, we compare two novel methods for POS tagging of Arabic without the use of gold standard word segmentation but with the full POS tagset of the Penn Arabic Treebank. The first approach uses complex tags that describe full words and does not require any word segmentation. The second approach is segmentation-based, using a machine learning segmenter. In this approach, the words are first segmented, then the segments are annotated with POS tags. Because of the word-based approach, we evaluate full word accuracy rather than segment accuracy. Word-based POS tagging yields better results than segment-based tagging (93.93% vs. 93.41%). Word based tagging also gives the best results on known words, the segmentation-based approach gives better results on unknown words. Combining both methods results in a word accuracy of 94.37%, which is very close to the result obtained by using gold standard segmentation (94.91%).
引用
收藏
页码:2537 / 2543
页数:7
相关论文
共 50 条
  • [1] Part of speech tagging for Arabic
    Kuebler, Sandra
    Mohamed, Emad
    [J]. NATURAL LANGUAGE ENGINEERING, 2012, 18 : 521 - 548
  • [2] Toward enhanced Arabic speech recognition using part of speech tagging
    AbuZeina, Dia
    Al-Khatib, Wasfi
    Elshafei, Moustafa
    Al-Muhtaseb, Husni
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2011, 14 (04) : 419 - 426
  • [3] Morphological Segmentation and Part-of-Speech Tagging for the Arabic Heritage
    Mohamed, Emad
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2018, 17 (03)
  • [4] Improving Arabic Part-of-Speech Tagging through Morphological Analysis
    Albared, Mohammed
    Omar, Nazlia
    Ab Aziz, Mohd. Juzaiddin
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2011, PT I, 2011, 6591 : 317 - 326
  • [5] Parallel HMM-Based Approach for Arabic Part of Speech Tagging
    Kadim, Ayoub
    Lazrek, Azzeddine
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2018, 15 (02) : 341 - 351
  • [6] Part of speech tagging for Arabic text based radial basis function
    Shahin, Osama R.
    El Rwelli, Rady
    [J]. JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY, 2021, 24 (08): : 2443 - 2459
  • [7] Part of Speech Tagging Approach to Designing Compound Words for Arabic Continuous Speech Recognition Systems
    AbuZeina, Dia
    Elshafei, Moustafa
    Al-Khatib, Wasfi
    [J]. INFORMATICS ENGINEERING AND INFORMATION SCIENCE, PT IV, 2011, 254 : 330 - 338
  • [8] Pattern-based Algorithm for Part-of-Speech Tagging Arabic Text
    Alqrainy, Shihadeh
    Alserhan, Hasan Muaidi
    Ayesh, Aladdin
    [J]. ICCES: 2008 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS, 2007, : 119 - +
  • [9] Rule Based Approach for Arabic Part of Speech Tagging and Name Entity Recognition
    Btoush, Mohammad Hjouj
    Alarabeyyat, Abdulsalam
    Olab, Isa
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (06) : 331 - 335
  • [10] Part-of-speech tagging for Arabic tweets using CRF and Bi-LSTM
    AlKhwiter, Wasan
    Al-Twairesh, Nora
    [J]. COMPUTER SPEECH AND LANGUAGE, 2021, 65 (65):