Effective Deep Learning Models for Automatic Diacritization of Arabic Text

被引:6
|
作者
Madhfar, Mokthar Ali Hasan [1 ]
Qamar, Ali Mustafa [1 ,2 ]
机构
[1] Qassim Univ, Dept Comp Sci, Coll Comp, Buraydah, Saudi Arabia
[2] Natl Univ Sci & Technol, Sch Elect Engn & Comp Sci, Dept Comp, Islamabad 44000, Pakistan
来源
IEEE ACCESS | 2021年 / 9卷
关键词
Arabic language; Tacotron; diacritization; deep learning; text-to-speech;
D O I
10.1109/ACCESS.2020.3041676
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
While building a text-to-speech system for the Arabic language, we found that the system synthesized speeches with many pronunciation errors. The primary source of these errors is the lack of diacritics in modern standard Arabic writing. These diacritics are small strokes that appear above or below each letter to provide pronunciation and grammatical information. We propose three deep learning models to recover Arabic text diacritics based on our work in a text-to-speech synthesis system using deep learning. The first model is a baseline model used to test how a simple deep learning model performs on the corpora. The second model is based on an encoder-decoder architecture, which resembles our text-to-speech synthesis model with many modifications to suit this problem. The last model is based on the encoder part of the text-to-speech model, which achieves state-of-the-art performances in both word error rate and diacritic error rate metrics. These models will benefit a wide range of natural language processing applications such as text-to-speech, part-of-speech tagging, and machine translation.
引用
收藏
页码:273 / 288
页数:16
相关论文
共 50 条
  • [1] On the Training of Deep Neural Networks for Automatic Arabic-Text Diacritization
    Karim, Asma Abdel
    Abandah, Gheith
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (08) : 276 - 286
  • [2] A Deep Belief Network Classification Approach for Automatic Diacritization of Arabic Text
    Almanaseer, Waref
    Alshraideh, Mohammad
    Alkadi, Omar
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (11):
  • [3] Simple Extensible Deep Learning Model for Automatic Arabic Diacritization
    Abbad, Hamza
    Xiong, Shengwu
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (02)
  • [4] Arabic Text Diacritization Using Deep Neural Networks
    Fadel, Ali
    Tuffaha, Ibraheem
    Al-Jawarneh, Bara
    Al-Ayyoub, Mahmoud
    [J]. 2019 2ND INTERNATIONAL CONFERENCE ON COMPUTER APPLICATIONS & INFORMATION SECURITY (ICCAIS), 2019,
  • [5] Automatic diacritization of Arabic text using recurrent neural networks
    Gheith A. Abandah
    Alex Graves
    Balkees Al-Shagoor
    Alaa Arabiyat
    Fuad Jamour
    Majid Al-Taee
    [J]. International Journal on Document Analysis and Recognition (IJDAR), 2015, 18 : 183 - 197
  • [6] A Comparative Study of Some Automatic Arabic Text Diacritization Systems
    Mijlad, Ali
    El Younoussi, Yacine
    [J]. ADVANCES IN HUMAN-COMPUTER INTERACTION, 2022, 2022
  • [7] Automatic diacritization of Arabic text using recurrent neural networks
    Abandah, Gheith A.
    Graves, Alex
    Al-Shagoor, Balkees
    Arabiyat, Alaa
    Jamour, Fuad
    Al-Taee, Majid
    [J]. INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2015, 18 (02) : 183 - 197
  • [8] Deep Learning Framework with Confused Sub-Set Resolution Architecture for Automatic Arabic Diacritization
    Rashwan, Mohsen A. A.
    Al Sallab, Ahmad A.
    Raafat, Hazem M.
    Rafea, Ahmed
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (03) : 505 - 516
  • [9] Arabic Text Diacritization: Overview And Solution
    Mijlad, Ali
    El Younoussi, Yacine
    [J]. 4TH INTERNATIONAL CONFERENCE ON SMART CITY APPLICATIONS (SCA' 19), 2019,
  • [10] A Weighted Combination of Speech with Text-based Models for Arabic Diacritization
    Azim, Aisha S.
    Wang, Xiaoxuan
    Sim, Khe Chai
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2331 - 2334