Automatic diacritization of Arabic text using recurrent neural networks

被引:0
|
作者
Gheith A. Abandah
Alex Graves
Balkees Al-Shagoor
Alaa Arabiyat
Fuad Jamour
Majid Al-Taee
机构
[1] University of Jordan,Computer Engineering Department
[2] Google DeepMind,undefined
[3] King Abdullah University of Science and Technology,undefined
关键词
Automatic diacritization; Arabic text; Machine learning; Sequence transcription; Recurrent neural networks ; Deep neural networks; Long short-term memory;
D O I
暂无
中图分类号
学科分类号
摘要
This paper presents a sequence transcription approach for the automatic diacritization of Arabic text. A recurrent neural network is trained to transcribe undiacritized Arabic text with fully diacritized sentences. We use a deep bidirectional long short-term memory network that builds high-level linguistic abstractions of text and exploits long-range context in both input directions. This approach differs from previous approaches in that no lexical, morphological, or syntactical analysis is performed on the data before being processed by the net. Nonetheless, when the network is post-processed with our error correction techniques, it achieves state-of-the-art performance, yielding an average diacritic and word error rates of 2.09 and 5.82 %, respectively, on samples from 11 books. For the LDC ATB3 benchmark, this approach reduces the diacritic error rate by 25 %, the word error rate by 20 %, and the last-letter diacritization error rate by 33 % over the best published results.
引用
收藏
页码:183 / 197
页数:14
相关论文
共 50 条
  • [41] Arabic Video Text Recognition Based on Multi-Dimensional Recurrent Neural Networks
    Zayene, Oussama
    Amamou, Soumaya Essefi
    BenAmara, Najoua Essoukri
    2017 IEEE/ACS 14TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2017, : 725 - 729
  • [42] Arabic Diacritization Using Bidirectional Long Short-Term Memory Neural Networks With Conditional Random Fields
    Al-Thubaity, Abdulmohsen
    Alkhalifa, Atheer
    Almuhareb, Abdulrahman
    Alsanie, Waleed
    IEEE ACCESS, 2020, 8 : 154984 - 154996
  • [43] Automatic playlist generation using Convolutional Neural Networks and Recurrent Neural Networks
    Irene, Rosilde Tatiana
    Borrelli, Clara
    Zanoni, Massimiliano
    Buccoli, Michele
    Sarti, Augusto
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [44] Using recurrent neural networks for automatic chromosome classification
    Martínez, U
    Juan, A
    Casacuberta, F
    ARTIFICIAL NEURAL NETWORKS - ICANN 2002, 2002, 2415 : 565 - 570
  • [45] Automatic Modulation Classification using Recurrent Neural Networks
    Hong, Dehua
    Zhang, Zilong
    Xu, Xiaodong
    PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 695 - 700
  • [46] Automatic Fruit Grading Using Recurrent Neural Networks
    Naicker, Jo-Neil
    Viriri, Serestina.
    2023 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE, CSCI 2023, 2023, : 1167 - 1176
  • [47] Neural Networks for the Automation of Arabic Text Categorization
    AlSaleem, Saleh M.
    2013 INTERNATIONAL CONFERENCE ON COMPUTER APPLICATIONS TECHNOLOGY (ICCAT), 2013,
  • [48] A Weighted Combination of Speech with Text-based Models for Arabic Diacritization
    Azim, Aisha S.
    Wang, Xiaoxuan
    Sim, Khe Chai
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2331 - 2334
  • [49] Simple Extensible Deep Learning Model for Automatic Arabic Diacritization
    Abbad, Hamza
    Xiong, Shengwu
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (02)
  • [50] Classifying and diacritizing Arabic poems using deep recurrent neural networks
    Abandah, Gheith A.
    Khedher, Mohammed Z.
    Abdel-Majeed, Mohammad R.
    Mansour, Hamdi M.
    Hulliel, Salma F.
    Bisharat, Lara M.
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (06) : 3775 - 3788