Automatic diacritization of Arabic text using recurrent neural networks

被引:0
|
作者
Gheith A. Abandah
Alex Graves
Balkees Al-Shagoor
Alaa Arabiyat
Fuad Jamour
Majid Al-Taee
机构
[1] University of Jordan,Computer Engineering Department
[2] Google DeepMind,undefined
[3] King Abdullah University of Science and Technology,undefined
关键词
Automatic diacritization; Arabic text; Machine learning; Sequence transcription; Recurrent neural networks ; Deep neural networks; Long short-term memory;
D O I
暂无
中图分类号
学科分类号
摘要
This paper presents a sequence transcription approach for the automatic diacritization of Arabic text. A recurrent neural network is trained to transcribe undiacritized Arabic text with fully diacritized sentences. We use a deep bidirectional long short-term memory network that builds high-level linguistic abstractions of text and exploits long-range context in both input directions. This approach differs from previous approaches in that no lexical, morphological, or syntactical analysis is performed on the data before being processed by the net. Nonetheless, when the network is post-processed with our error correction techniques, it achieves state-of-the-art performance, yielding an average diacritic and word error rates of 2.09 and 5.82 %, respectively, on samples from 11 books. For the LDC ATB3 benchmark, this approach reduces the diacritic error rate by 25 %, the word error rate by 20 %, and the last-letter diacritization error rate by 33 % over the best published results.
引用
收藏
页码:183 / 197
页数:14
相关论文
共 50 条
  • [1] Automatic diacritization of Arabic text using recurrent neural networks
    Abandah, Gheith A.
    Graves, Alex
    Al-Shagoor, Balkees
    Arabiyat, Alaa
    Jamour, Fuad
    Al-Taee, Majid
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2015, 18 (02) : 183 - 197
  • [2] Arabic Text Diacritization Using Deep Neural Networks
    Fadel, Ali
    Tuffaha, Ibraheem
    Al-Jawarneh, Bara
    Al-Ayyoub, Mahmoud
    2019 2ND INTERNATIONAL CONFERENCE ON COMPUTER APPLICATIONS & INFORMATION SECURITY (ICCAIS), 2019,
  • [3] On the Training of Deep Neural Networks for Automatic Arabic-Text Diacritization
    Karim, Asma Abdel
    Abandah, Gheith
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (08) : 276 - 286
  • [4] Investigating Hybrid Approaches for Arabic Text Diacritization with Recurrent Neural Networks
    Alqudah, Saba'
    Abandah, Gheith
    Arabiyat, Alaa
    2017 IEEE JORDAN CONFERENCE ON APPLIED ELECTRICAL ENGINEERING AND COMPUTING TECHNOLOGIES (AEECT), 2017,
  • [5] ACCURATE AND FAST RECURRENT NEURAL NETWORK SOLUTION FOR THE AUTOMATIC DIACRITIZATION OF ARABIC TEXT
    Abandah, Gheith
    Abdel-Karim, Asma
    JORDANIAN JOURNAL OF COMPUTERS AND INFORMATION TECHNOLOGY, 2020, 6 (02): : 103 - 121
  • [6] Automatic dottization of Arabic text (Rasms) using deep recurrent neural networks
    Alhathloul, Zainab
    Ahmad, Irfan
    PATTERN RECOGNITION LETTERS, 2022, 162 : 47 - 55
  • [7] Automatic Methods and Neural Networks in Arabic Texts Diacritization: A Comprehensive Survey
    Almanea, Manar M.
    IEEE ACCESS, 2021, 9 (09): : 145012 - 145032
  • [8] Arabic Text Generation Using Recurrent Neural Networks
    Souri, Adnan
    El Maazouzi, Zakaria
    Al Achhab, Mohammed
    Eddine El Mohajir, Badr
    BIG DATA, CLOUD AND APPLICATIONS, BDCA 2018, 2018, 872 : 523 - 533
  • [9] A Comparative Study of Some Automatic Arabic Text Diacritization Systems
    Mijlad, Ali
    El Younoussi, Yacine
    ADVANCES IN HUMAN-COMPUTER INTERACTION, 2022, 2022
  • [10] Effective Deep Learning Models for Automatic Diacritization of Arabic Text
    Madhfar, Mokthar Ali Hasan
    Qamar, Ali Mustafa
    IEEE ACCESS, 2021, 9 : 273 - 288