Arabic Text Diacritization Using Deep Neural Networks

被引:17
|
作者
Fadel, Ali [1 ]
Tuffaha, Ibraheem [1 ]
Al-Jawarneh, Bara [1 ]
Al-Ayyoub, Mahmoud [1 ]
机构
[1] Jordan Univ Sci & Technol, Irbid, Jordan
关键词
Deep Learning; Arabic text diacritization; Deep Neural Network; AUTOMATIC DIACRITIZATION;
D O I
10.1109/cais.2019.8769512
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Diacritization of Arabic text is both an interesting and a challenging problem at the same time with various applications ranging from speech synthesis to helping students learning the Arabic language. Like many other tasks or problems in Arabic language processing, the weak efforts invested into this problem and the lack of available (open-source) resources hinder the progress towards solving this problem. This work provides a critical review for the currently existing systems, measures and resources for Arabic text diacritization. Moreover, it introduces a much-needed free-for-all cleaned dataset that can be easily used to benchmark any work on Arabic diacritization. Extracted from the Tashkeela Corpus, the dataset consists of 55K lines containing about 2.3M words. After constructing the dataset, existing tools and systems are tested on it. The results of the experiments show that the neural Shakkala system significantly outperforms traditional rule-based approaches and other closed-source tools with a Diacritic Error Rate (DER) of 2.88% compared with 13.78%, which the best DER for the non-neural approach (obtained by the Mishkal tool).
引用
收藏
页数:7
相关论文
共 50 条
  • [1] On the Training of Deep Neural Networks for Automatic Arabic-Text Diacritization
    Karim, Asma Abdel
    Abandah, Gheith
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (08) : 276 - 286
  • [2] Automatic diacritization of Arabic text using recurrent neural networks
    Gheith A. Abandah
    Alex Graves
    Balkees Al-Shagoor
    Alaa Arabiyat
    Fuad Jamour
    Majid Al-Taee
    [J]. International Journal on Document Analysis and Recognition (IJDAR), 2015, 18 : 183 - 197
  • [3] Automatic diacritization of Arabic text using recurrent neural networks
    Abandah, Gheith A.
    Graves, Alex
    Al-Shagoor, Balkees
    Arabiyat, Alaa
    Jamour, Fuad
    Al-Taee, Majid
    [J]. INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2015, 18 (02) : 183 - 197
  • [4] Investigating Hybrid Approaches for Arabic Text Diacritization with Recurrent Neural Networks
    Alqudah, Saba'
    Abandah, Gheith
    Arabiyat, Alaa
    [J]. 2017 IEEE JORDAN CONFERENCE ON APPLIED ELECTRICAL ENGINEERING AND COMPUTING TECHNOLOGIES (AEECT), 2017,
  • [5] Effective Deep Learning Models for Automatic Diacritization of Arabic Text
    Madhfar, Mokthar Ali Hasan
    Qamar, Ali Mustafa
    [J]. IEEE ACCESS, 2021, 9 : 273 - 288
  • [6] Arabic Text Diacritization: Overview And Solution
    Mijlad, Ali
    El Younoussi, Yacine
    [J]. 4TH INTERNATIONAL CONFERENCE ON SMART CITY APPLICATIONS (SCA' 19), 2019,
  • [7] A Deep Belief Network Classification Approach for Automatic Diacritization of Arabic Text
    Almanaseer, Waref
    Alshraideh, Mohammad
    Alkadi, Omar
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (11):
  • [8] Maximum Entropy Modeling for Diacritization of Arabic Text
    Sarikaya, Ruhi
    Emam, Ossama
    Zitouni, Imed
    Gao, Yuqing
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 145 - +
  • [9] Automatic dottization of Arabic text (Rasms) using deep recurrent neural networks
    Alhathloul, Zainab
    Ahmad, Irfan
    [J]. PATTERN RECOGNITION LETTERS, 2022, 162 : 47 - 55
  • [10] Combining and Merging Deep Neural Networks for Arabic Text Categorization
    El-Alami, Fatima-Zahra
    El Alaoui, Said Ouatik
    [J]. ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 1, 2022, 1417 : 338 - 347