Applying a Character-Level Model to a Short Arabic Dialect Sentence: A Saudi Dialect as a Case Study

被引:2
|
作者
Alqurashi, Tahani [1 ]
机构
[1] Umm Al Qura Univ, Coll Comp & Informat Syst, Informat Syst Dept, Mecca 24382, Saudi Arabia
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 23期
关键词
Arabic natural language processing; supervised learning approach; automatic dialect language identification; Saudi dialects; IDENTIFICATION; CORPUS;
D O I
10.3390/app122312435
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Arabic dialect identification (ADI) has recently drawn considerable interest among researchers in language recognition and natural language processing fields. This study investigated the use of a character-level model that is effectively unrestricted in its vocabulary, to identify fine-grained Arabic language dialects in the form of short written text. The Saudi dialects, particularly the four main Saudi dialects across the country, were considered in this study. The proposed ADI approach consists of five main phases, namely dialect data collection, data preprocessing and labelling, character-based feature extraction, deep learning character-based model/classical machine learning character-based models, and model evaluation performance. Several classical machine learning methods, including logistic regression, stochastic gradient descent, variations of the naive Bayes models, and support vector classification, were applied to the dataset. For the deep learning, the character convolutional neural network (CNN) model was adapted with a bidirectional long short-term memory approach. The collected data were tested under various classification tasks, including two-, three- and four-way ADI tasks. The results revealed that classical machine learning algorithms outperformed the CNN approach. Moreover, the use of the term frequency-inverse document frequency, combined with a character n-grams model ranging from unigrams to four-grams achieved the best performance among the tested parameters.
引用
收藏
页数:14
相关论文
共 16 条
  • [1] Character-Level Dialect Identification in Arabic Using Long Short-Term Memory
    Sayadi, Karim
    Hamidi, Mansour
    Bui, Marc
    Liwicki, Marcus
    Fischer, Andreas
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2017, PT II, 2018, 10762 : 324 - 337
  • [2] MICHAEL: Mining Character-level Patterns for Arabic Dialect Identification (MADAR Challenge)
    Ghoul, Dhaou
    Lejeune, Gael
    FOURTH ARABIC NATURAL LANGUAGE PROCESSING WORKSHOP (WANLP 2019), 2019, : 229 - 233
  • [3] A Character Level Convolutional BiLSTM for Arabic Dialect Identification
    Elaraby, Mohamed
    Zahran, Ahmed Ismail
    FOURTH ARABIC NATURAL LANGUAGE PROCESSING WORKSHOP (WANLP 2019), 2019, : 274 - 278
  • [4] Arabic dialect sentiment analysis with ZERO effort. Case study: Algerian dialect
    Guellil, Imane
    Mendoza, Marcelo
    Azouaou, Faical
    INTELIGENCIA ARTIFICIAL-IBEROAMERICAN JOURNAL OF ARTIFICIAL INTELLIGENCE, 2020, 23 (65): : 124 - 135
  • [5] A Character-level Short Text Classification Model Based On Spiking Neural Networks
    Jiang, Chengzhi
    Li, Linjing
    Zeng, Daniel Dajun
    Wang, Xiaoxuan
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [6] Using Character-Level Sequence-to-Sequence Model for Word Level Text Generation to Enhance Arabic Speech Recognition
    Azim, Mona A.
    Hussein, Wedad
    Badr, Nagwa L.
    IEEE ACCESS, 2023, 11 : 91173 - 91183
  • [7] Enhanced Emotion Analysis Model using Machine Learning in Saudi Dialect: COVID-19 Vaccination Case Study
    Mostafa, Abdulrahman O.
    Ahmed, Tarig M.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (01) : 356 - 369
  • [8] Character-level arabic text generation from sign language video using encoder-decoder model
    Boukdir, Abdelbasset
    Benaddy, Mohamed
    El Meslouhi, Othmane
    Kardouchi, Mustapha
    Akhloufi, Moulay
    DISPLAYS, 2023, 76
  • [9] Improving neural machine translation for low resource languages through non-parallel corpora: a case study of Egyptian dialect to modern standard Arabic translation
    Mohamed Atta Faheem
    Khaled Tawfik Wassif
    Hanaa Bayomi
    Sherif Mahdy Abdou
    Scientific Reports, 14
  • [10] Improving neural machine translation for low resource languages through non-parallel corpora: a case study of Egyptian dialect to modern standard Arabic translation
    Faheem, Mohamed Atta
    Wassif, Khaled Tawfik
    Bayomi, Hanaa
    Abdou, Sherif Mahdy
    SCIENTIFIC REPORTS, 2024, 14 (01)