An encoder-decoder based grapheme-to-phoneme converter for Bangla speech synthesis

被引:1
|
作者
Ahmad, Arif [1 ]
Selim, Mohammad Reza [1 ]
Iqbal, Muhammed Zafar [1 ]
Rahman, Mohammad Shahidur [1 ]
机构
[1] Shahjalal Univ Sci & Technol, Dept Comp Sci & Engn, Sylhet 3114, Bangladesh
关键词
Encoder-decoder; Sequence-to-sequence; GRU-RNN; NMT;
D O I
10.1250/ast.40.374
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes an encoder-decoder based sequence-to-sequence model for Grapheme-to-Phoneme (G2P) conversion in Bangla (Exonym: Bengali). G2P models are key components in speech recognition and speech synthesis systems as they describe how words are pronounced. Traditional, rule-based models do not perform well in unseen contexts. We propose to adopt a neural machine translation (NMT) model to solve the G2P problem. We used gated recurrent units (GRU) recurrent neural network (RNN) to build our model. In contrast to joint-sequence based G2P models, our encoder-decoder based model has the flexibility of not requiring explicit grapheme-to-phoneme alignment which are not straight forward to perform. We trained our model on a pronunciation dictionary of (approximately) 135,000 entries and obtained a word error rate (WER) of 12.49% which is a significant improvement from the existing rule-based and machine-learning based Bangla G2P models.
引用
收藏
页码:374 / 381
页数:8
相关论文
共 50 条
  • [1] BaNeL: an encoder-decoder based Bangla neural lemmatizer
    Md. Ashraful Islam
    Md. Towhiduzzaman
    Md. Tauhidul Islam Bhuiyan
    Abdullah Al Maruf
    Jesan Ahammed Ovi
    [J]. SN Applied Sciences, 2022, 4
  • [2] BaNeL: an encoder-decoder based Bangla neural lemmatizer
    Islam, Md Ashraful
    Towhiduzzaman, Md
    Bhuiyan, Md Tauhidul Islam
    Al Maruf, Abdullah
    Ovi, Jesan Ahammed
    [J]. SN APPLIED SCIENCES, 2022, 4 (05)
  • [3] DNN-based grapheme-to-phoneme conversion for Arabic text-to-speech synthesis
    Ikbel Hadj Ali
    Zied Mnasri
    Zied Lachiri
    [J]. International Journal of Speech Technology, 2020, 23 : 569 - 584
  • [4] DNN-based grapheme-to-phoneme conversion for Arabic text-to-speech synthesis
    Ali, Ikbel Hadj
    Mnasri, Zied
    Lachiri, Zied
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (03) : 569 - 584
  • [5] On Improving Code Mixed Speech Synthesis with Mixlingual Grapheme-to-Phoneme Model
    Bansal, Shubham
    Mukherjee, Arijit
    Satpal, Sandeepkumar
    Mehta, Rupeshkumar
    [J]. INTERSPEECH 2020, 2020, : 2957 - 2961
  • [6] Transformer based Grapheme-to-Phoneme Conversion
    Yolchuyeva, Sevinj
    Nemeth, Geza
    Gyires-Toth, Balint
    [J]. INTERSPEECH 2019, 2019, : 2095 - 2099
  • [7] A grapheme-to-phoneme translator for TTS synthesis in Greek
    Stathopoulou-Zois, P
    [J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2005, 14 (06) : 901 - 918
  • [8] EVALUATING GRAPHEME-TO-PHONEME CONVERTERS IN AUTOMATIC SPEECH RECOGNITION CONTEXT
    Jouvet, Denis
    Fohr, Dominique
    Illina, Irina
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4821 - 4824
  • [9] Customizing Grapheme-to-Phoneme System for Non-Trivial Transcription Problems in Bangla Language
    Shubha, Sudipta Saha
    Sadeq, Nafis
    Ahmed, Shafayat
    Islam, Md Nahidul
    Adnan, Muhammad Abdullah
    Khan, Md Yasin Ali
    Islam, Mohammad Zuberul
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 3191 - 3200
  • [10] A Neural Attention-Based Encoder-Decoder Approach for English to Bangla Translation
    Al Shiam, Abdullah
    Redwan, Sadi Md.
    Kabir, Humaun
    Shin, Jungpil
    [J]. COMPUTER SCIENCE JOURNAL OF MOLDOVA, 2023, 31 (01) : 70 - 85