DNN-based grapheme-to-phoneme conversion for Arabic text-to-speech synthesis

被引:0
|
作者
Ikbel Hadj Ali
Zied Mnasri
Zied Lachiri
机构
[1] Ecole Nationale d’Ingénieurs de Tunis,Signal, Image and Technology of Information Laboratory, Electrical Engineering Department
[2] University Tunis El-Manar,Signal, Image and Technology of Information Laboratory, Electrical Engineering Department
[3] Ecole Nationale d’Ingénieurs de Tunis,undefined
[4] University Tunis El-Manar,undefined
关键词
Arabic text-to-speech synthesis; Deep neural networks (DNN); Grapheme-to-phoneme conversion; Diacritic signs; Gemination;
D O I
暂无
中图分类号
学科分类号
摘要
Arabic text-to-speech synthesis from non-diacritized text is still a big challenge, because of unique Arabic language rules and characteristics. Indeed, the diacritic and gemination signs, which are special characters representing respectively short vowels and consonant doubling, have a major effect on accurate pronunciation of Arabic. However these signs are often not mentioned in written texts, since most of Arab readers are used to guess them from the context. To tackle this issue, this paper presents a grapheme-to-phoneme conversion system for Arabic, which constitutes the text processing module of a deep neural networks (DNN)-based Arabic TTS systems. In the case of Arabic text, this step starts with predicting the diacritic and gemination signs. In this work, this step was fully realized based on DNN. Finally, the grapheme-to-phoneme conversion of the diacritized text was achieved using the Buckwalter code. In comparison to state-of-the-art approaches, the proposed system gives a higher accuracy rate either for all phonemes or for each class, and high precision, recall and F1 score for each class of diacritic signs.
引用
收藏
页码:569 / 584
页数:15
相关论文
共 50 条
  • [1] DNN-based grapheme-to-phoneme conversion for Arabic text-to-speech synthesis
    Ali, Ikbel Hadj
    Mnasri, Zied
    Lachiri, Zied
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (03) : 569 - 584
  • [2] Automatic Grapheme-to-Phoneme Conversion of Arabic Text
    Al-Daradkah, Belal
    Al-Diri, Bashir
    [J]. 2015 SCIENCE AND INFORMATION CONFERENCE (SAI), 2015, : 468 - 473
  • [3] ERROR DETECTION OF GRAPHEME-TO-PHONEME CONVERSION IN TEXT-TO-SPEECH SYNTHESIS USING SPEECH SIGNAL AND LEXICAL CONTEXT
    Vythelingum, Kevin
    Esteve, Yannick
    Rosec, Olivier
    [J]. 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 692 - 697
  • [4] A unified approach to grapheme-to-phoneme conversion for the PLATTOS Slovenian text-to-speech system
    Rojc, Matej
    Kacic, Zdravko
    [J]. APPLIED ARTIFICIAL INTELLIGENCE, 2007, 21 (06) : 563 - 603
  • [5] Memory-based Data-driven Approach for Grapheme-to-Phoneme Conversion in Bengali Text-to-Speech Synthesis System
    Ghosh, Krishnendu
    Rao, K. Sreenivasa
    [J]. 2011 ANNUAL IEEE INDIA CONFERENCE (INDICON-2011): ENGINEERING SUSTAINABLE SOLUTIONS, 2011,
  • [6] Objective evaluation of grapheme to phoneme conversion for text-to-speech synthesis in French
    Yvon, F
    de Mareuil, PB
    d'Alessandro, C
    Auberge, V
    Bagein, M
    Bailly, G
    Bechet, F
    Foukia, S
    Goldman, JF
    Keller, E
    O'Shaughnessy, D
    Pagel, V
    Sannier, F
    Veronis, J
    Zellner, B
    [J]. COMPUTER SPEECH AND LANGUAGE, 1998, 12 (04): : 393 - 410
  • [7] Text-To-Speech with cross-lingual Neural Network-based grapheme-to-phoneme models
    Gonzalvo, Xavi
    Podsiadlo, Monika
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 765 - 769
  • [8] DNN-Based Arabic Speech Synthesis
    Amrouche, Aissa
    Bentrcia, Youssouf
    Boubakeur, Khadidja Nesrine
    Abed, Ahcene
    [J]. 2022 9TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND ELECTRONICS ENGINEERING (ICEEE 2022), 2022, : 378 - 382
  • [9] Transformer based Grapheme-to-Phoneme Conversion
    Yolchuyeva, Sevinj
    Nemeth, Geza
    Gyires-Toth, Balint
    [J]. INTERSPEECH 2019, 2019, : 2095 - 2099
  • [10] Arabic grapheme-to-phoneme conversion based on joint multi-gram model
    Cherifi, El-Hadi
    Guerti, Mhania
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (01) : 173 - 182