Effective Deep Learning Models for Automatic Diacritization of Arabic Text

被引:6
|
作者
Madhfar, Mokthar Ali Hasan [1 ]
Qamar, Ali Mustafa [1 ,2 ]
机构
[1] Qassim Univ, Dept Comp Sci, Coll Comp, Buraydah, Saudi Arabia
[2] Natl Univ Sci & Technol, Sch Elect Engn & Comp Sci, Dept Comp, Islamabad 44000, Pakistan
来源
IEEE ACCESS | 2021年 / 9卷
关键词
Arabic language; Tacotron; diacritization; deep learning; text-to-speech;
D O I
10.1109/ACCESS.2020.3041676
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
While building a text-to-speech system for the Arabic language, we found that the system synthesized speeches with many pronunciation errors. The primary source of these errors is the lack of diacritics in modern standard Arabic writing. These diacritics are small strokes that appear above or below each letter to provide pronunciation and grammatical information. We propose three deep learning models to recover Arabic text diacritics based on our work in a text-to-speech synthesis system using deep learning. The first model is a baseline model used to test how a simple deep learning model performs on the corpora. The second model is based on an encoder-decoder architecture, which resembles our text-to-speech synthesis model with many modifications to suit this problem. The last model is based on the encoder part of the text-to-speech model, which achieves state-of-the-art performances in both word error rate and diacritic error rate metrics. These models will benefit a wide range of natural language processing applications such as text-to-speech, part-of-speech tagging, and machine translation.
引用
收藏
页码:273 / 288
页数:16
相关论文
共 50 条
  • [41] Cyberbullying Detection Model for Arabic Text Using Deep Learning
    Albayari, Reem
    Abdallah, Sherief
    Shaalan, Khaled
    [J]. JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2024,
  • [42] Sarcasm Detection in Arabic Short Text using Deep Learning
    Al-Jamal, Wafa' Q.
    Mustafa, Ahmad M.
    Ali, Mostafa Z.
    [J]. 2022 13TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2022, : 362 - 366
  • [43] A Proposed Deep Learning based Framework for Arabic Text Classification
    Sayed, Mostafa
    Abdelkader, Hatem
    Khedr, Ayman E.
    Salem, Rashed
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (08) : 305 - 313
  • [44] Correction to: Arabic text summarization using deep learning approach
    Molham Al-Maleh
    Said Desouki
    [J]. Journal of Big Data, 8
  • [45] Neural Arabic Text Diacritization: State-of-the-Art Results and a Novel Approach for Arabic NLP Downstream Tasks
    Fadel, Ali
    Tuffaha, Ibraheem
    Al-Ayyoub, Mahmoud
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (01)
  • [46] Automatic dottization of Arabic text (Rasms) using deep recurrent neural networks
    Alhathloul, Zainab
    Ahmad, Irfan
    [J]. PATTERN RECOGNITION LETTERS, 2022, 162 : 47 - 55
  • [47] Using Unsupervised Deep Learning for Automatic Summarization of Arabic Documents
    Alami, Nabil
    En-nahnahi, Noureddine
    Ouatik, Said Alaoui
    Meknassi, Mohammed
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2018, 43 (12) : 7803 - 7815
  • [48] Using Unsupervised Deep Learning for Automatic Summarization of Arabic Documents
    Nabil Alami
    Noureddine En-nahnahi
    Said Alaoui Ouatik
    Mohammed Meknassi
    [J]. Arabian Journal for Science and Engineering, 2018, 43 : 7803 - 7815
  • [49] Deep Transformer Language Models for Arabic Text Summarization: A Comparison Study
    Chouikhi, Hasna
    Alsuhaibani, Mohammed
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (23):
  • [50] Deep Neural Network Models for Paraphrased Text Classification in the Arabic Language
    Mahmoud, Adnen
    Zrigui, Mounir
    [J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2019), 2019, 11608 : 3 - 16