Multi-Dialect Arabic POS Tagging: A CRF Approach

被引:0
|
作者
Darwish, Kareem [1 ]
Mubarak, Hamdy [1 ]
Eldesouki, Mohamed [1 ]
Abdelali, Ahmed [1 ]
Samih, Younes [2 ]
Alharbi, Randah [3 ]
Attia, Mohammed [4 ]
Magdy, Walid [3 ]
Kallmeyer, Laura [2 ]
机构
[1] QCRI, Ar Rayyan, Qatar
[2] Univ Dusseldorf, Dusseldorf, Germany
[3] Univ Edinburgh, Edinburgh, Midlothian, Scotland
[4] Google Inc, Mountain View, CA USA
关键词
Arabic dialects; POS tagging; CRF;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper introduces a new dataset of POS-tagged Arabic tweets in four major dialects along with tagging guidelines. The data, which we are releasing publicly, includes tweets in Egyptian, Levantine, Gulf, and Maghrebi, with 350 tweets for each dialect with appropriate train/test/development splits for 5-fold cross validation. We use a Conditional Random Fields (CRF) sequence labeler to train POS taggers for each dialect and examine the effect of cross and joint dialect training, and give benchmark results for the datasets. Using clitic n-grams, clitic metatypes, and stem templates as features, we were able to train a joint model that can correctly tag four different dialects with an average accuracy of 89.3%.
引用
收藏
页码:93 / 98
页数:6
相关论文
共 50 条
  • [31] A Machine Learning Approach to POS Tagging
    Lluís Màrquez
    Lluís Padró
    Horacio Rodríguez
    [J]. Machine Learning, 2000, 39 : 59 - 91
  • [32] Global RNN Transducer Models For Multi-dialect Speech Recognition
    Fukuda, Takashi
    Thomas, Samuel
    Suzuki, Masayuki
    Kurata, Gakuto
    Saon, George
    Kingsbury, Brian
    [J]. INTERSPEECH 2022, 2022, : 3138 - 3142
  • [33] Chinese Multi-Dialect Speech Recognition Based on Instruction Tuning
    Ding, Timin
    Sun, Kai
    Zhang, Xu
    Yu, Jian
    Huang, Degen
    [J]. FOURTH SYMPOSIUM ON PATTERN RECOGNITION AND APPLICATIONS, SPRA 2023, 2024, 13162
  • [34] A HIGHLY ADAPTIVE ACOUSTIC MODEL FOR ACCURATE MULTI-DIALECT SPEECH RECOGNITION
    Yoo, Sanghyun
    Song, Inchul
    Bengio, Yoshua
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5716 - 5720
  • [35] End-to-end Japanese Multi-dialect Speech Recognition and Dialect Identification with Multi-task Learning
    Imaizumi, Ryo
    Masumura, Ryo
    Shiota, Sayaka
    Kiya, Hitoshi
    [J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2022, 11 (01)
  • [36] MULTI-DIALECT SPEECH RECOGNITION WITH A SINGLE SEQUENCE-TO-SEQUENCE MODEL
    Li, Bo
    Sainath, Tara N.
    Sim, Khe Chai
    Bacchiani, Michiel
    Weinstein, Eugene
    Nguyen, Patrick
    Chen, Zhifeng
    Wu, Yonghui
    Rao, Kanishka
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4749 - 4753
  • [37] TEACHING ENGLISH TO JAMAICAN CREOLE SPEAKERS - MODEL OF A MULTI-DIALECT SITUATION
    CRAIG, DR
    [J]. LANGUAGE LEARNING, 1966, 16 (1-2) : 49 - 49
  • [38] MULTI-DIALECT SPEECH RECOGNITION IN ENGLISH USING ATTENTION ON ENSEMBLE OF EXPERTS
    Das, Amit
    Kumar, Kshitiz
    Wu, Jian
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6244 - 6248
  • [39] Adversarial Multitask Learning for Joint Multi-Feature and Multi-Dialect Morphological Modeling
    Zalmout, Nasser
    Habash, Nizar
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1775 - 1786
  • [40] A hybrid approach to word segmentation and POS tagging
    Oki Electric Industry Co., Ltd., 2−5−7 Honmachi, Chuo-ku, Osaka
    541−0053, Japan
    不详
    619−0289, Japan
    [J]. Proc. Annu. Meet. Assoc. Comput Linguist., 1600, (217-220):