Multi-Dialect Arabic POS Tagging: A CRF Approach

被引:0
|
作者
Darwish, Kareem [1 ]
Mubarak, Hamdy [1 ]
Eldesouki, Mohamed [1 ]
Abdelali, Ahmed [1 ]
Samih, Younes [2 ]
Alharbi, Randah [3 ]
Attia, Mohammed [4 ]
Magdy, Walid [3 ]
Kallmeyer, Laura [2 ]
机构
[1] QCRI, Ar Rayyan, Qatar
[2] Univ Dusseldorf, Dusseldorf, Germany
[3] Univ Edinburgh, Edinburgh, Midlothian, Scotland
[4] Google Inc, Mountain View, CA USA
关键词
Arabic dialects; POS tagging; CRF;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper introduces a new dataset of POS-tagged Arabic tweets in four major dialects along with tagging guidelines. The data, which we are releasing publicly, includes tweets in Egyptian, Levantine, Gulf, and Maghrebi, with 350 tweets for each dialect with appropriate train/test/development splits for 5-fold cross validation. We use a Conditional Random Fields (CRF) sequence labeler to train POS taggers for each dialect and examine the effect of cross and joint dialect training, and give benchmark results for the datasets. Using clitic n-grams, clitic metatypes, and stem templates as features, we were able to train a joint model that can correctly tag four different dialects with an average accuracy of 89.3%.
引用
收藏
页码:93 / 98
页数:6
相关论文
共 50 条
  • [1] Multi-Dialect Arabic Speech Recognition
    Ali, Abbas Raza
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [2] A Multi-Dialect, Multi-Genre Corpus of Informal Written Arabic
    Cotterell, Ryan
    Callison-Burch, Chris
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
  • [3] Graphical Models for Multi-Dialect Arabic Isolated Words Recognition
    Zarrouk, Elyes
    BenAyed, Yassine
    Gargouri, Faiez
    [J]. KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS 19TH ANNUAL CONFERENCE, KES-2015, 2015, 60 : 508 - 516
  • [4] Multi-dialect Workflows
    Kalinichenko, Leonid
    Stupnikov, Sergey
    Vovchenko, Alexey
    Kovalev, Dmitry
    [J]. ADVANCES IN DATABASES AND INFORMATION SYSTEMS (ADBIS 2014), 2014, 8716 : 352 - 365
  • [5] A BERT Based Approach for Arabic POS Tagging
    Saidi, Rakia
    Jarray, Fethi
    Mansour, Mahmud
    [J]. ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2021, PT I, 2021, 12861 : 311 - 321
  • [6] Arabic POS Tagging
    Mohamed, Emad
    Kuebler, Sandra
    [J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,
  • [7] The Building and Evaluation of a Mobile Parallel Multi-Dialect Speech Corpus for Arabic
    Almeman, Khalid
    [J]. ARABIC COMPUTATIONAL LINGUISTICS, 2018, 142 : 166 - 173
  • [8] Effective multi-dialectal arabic POS tagging
    Darwish, Kareem
    Attia, Mohammed
    Mubarak, Hamdy
    Samih, Younes
    Abdelali, Ahmed
    Marquez, Lluis
    Eldesouki, Mohamed
    Kallmeyer, Laura
    [J]. NATURAL LANGUAGE ENGINEERING, 2020, 26 (06) : 677 - 690
  • [9] Joint Segmentation and POS Tagging for Arabic Using a CRF-based Classifier
    Gahbiche-Braham, Souhir
    Bonneau-Maynard, Helene
    Lavergne, Thomas
    Yvon, Francois
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2107 - 2113
  • [10] Arabic Sentiment Analysis for Multi-dialect Text using Machine Learning Techniques
    Hussein, Aya H.
    Moawad, Ibrahim F.
    Badry, Rasha M.
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (12) : 693 - 700