Arabic grapheme-to-phoneme conversion based on joint multi-gram model

被引:0
|
作者
Cherifi, El-Hadi [1 ,2 ]
Guerti, Mhania [1 ]
机构
[1] Ecole Natl Polytech, Signal & Commun Lab, Algiers, Algeria
[2] Univ Tlemcen, Tilimsen, Algeria
关键词
Grapheme-to-phoneme; Conversion; Joint multi-gram model; Text-to-speech; Modern standard arabic;
D O I
10.1007/s10772-020-09779-8
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Grapheme-to-phoneme conversion (G2P) process-which is is a necessary part of text-to-speech (TTS) systems-aims to predict a sequence of phonemes from a sequence of graphemes. For most languages, this task is limited to concatenated segment pronunciations in the case of words, and concatenated pronunciations of words in the case of a statement. This approach, however, is not viable for some languages, such as the Arabic language, where transitions between sounds in the word and between words in the statement cause changes in their pronunciation according to several considerations depending on the orthographic, phonetic and phonological context. In this work, we propose an approach for Arabic G2P Conversion based on a probabilistic method: joint multi-gram model (JMM). In this approach, we do not need to explain all the G2P correspondence anomalies that we will detail in this paper, but all this knowledge will be included implicitly at the learning stage. We discuss the results and experiments of this method applied on a pronunciation dictionary of the most commonly used Arabic words, and on carefully chosen and annotated texts for continuous speech. The current results do not surpass the baseline system but point the way towards future innovations. Indeed, these results are quite satisfactory on the dictionary adopted for test and learning, with a score of just over 10% error rate on the transcription of phonemes (phoneme error rate), and on the corpus of continuous speech, with a score of just over 11% of PER.
引用
收藏
页码:173 / 182
页数:10
相关论文
共 50 条
  • [1] Arabic grapheme-to-phoneme conversion based on joint multi-gram model
    El-Hadi Cherifi
    Mhania Guerti
    [J]. International Journal of Speech Technology, 2021, 24 : 173 - 182
  • [2] Automatic Grapheme-to-Phoneme Conversion of Arabic Text
    Al-Daradkah, Belal
    Al-Diri, Bashir
    [J]. 2015 SCIENCE AND INFORMATION CONFERENCE (SAI), 2015, : 468 - 473
  • [3] JOINT ALIGNMENT LEARNING-ATTENTION BASED MODEL FOR GRAPHEME-TO-PHONEME CONVERSION
    Wang, Yonghe
    Bao, Feilong
    Zhang, Hui
    Gao, Guanglai
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7788 - 7792
  • [4] Transformer based Grapheme-to-Phoneme Conversion
    Yolchuyeva, Sevinj
    Nemeth, Geza
    Gyires-Toth, Balint
    [J]. INTERSPEECH 2019, 2019, : 2095 - 2099
  • [5] Joint-sequence models for grapheme-to-phoneme conversion
    Bisani, Maximilian
    Ney, Hermann
    [J]. SPEECH COMMUNICATION, 2008, 50 (05) : 434 - 451
  • [6] Grapheme-to-Phoneme Conversion with a Multilingual Transformer Model
    ElSaadany, Omnia
    Suter, Benjamin
    [J]. 17TH SIGMORPHON WORKSHOP ON COMPUTATIONAL RESEARCH IN PHONETICS PHONOLOGY, AND MORPHOLOGY (SIGMORPHON 2020), 2020, : 85 - 89
  • [7] Phonetisaurus: Exploring grapheme-to-phoneme conversion with joint n-gram models in the WFST framework
    Novak, Josef Robert
    Minematsu, Nobuaki
    Hirose, Keikichi
    [J]. NATURAL LANGUAGE ENGINEERING, 2016, 22 (06) : 907 - 938
  • [8] DNN-based grapheme-to-phoneme conversion for Arabic text-to-speech synthesis
    Ikbel Hadj Ali
    Zied Mnasri
    Zied Lachiri
    [J]. International Journal of Speech Technology, 2020, 23 : 569 - 584
  • [9] Grapheme-to-phoneme conversion of Arabic numeral expressions for embedded TTS systems
    Jung, Youngim
    Yoon, Aesun
    Kwon, Hyuk-Chul
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (01): : 296 - 309
  • [10] DNN-based grapheme-to-phoneme conversion for Arabic text-to-speech synthesis
    Ali, Ikbel Hadj
    Mnasri, Zied
    Lachiri, Zied
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (03) : 569 - 584