Myanmar Text-to-Speech System based on Tacotron-2

被引:0
|
作者
Win, Yuzana [1 ]
Masada, Tomonari [2 ]
机构
[1] Yangon Technol Univ, Dept CEIT, Yangon, Myanmar
[2] Rikkyo Univ, Artificial Intelligence & Sci, Tokyo, Japan
关键词
Tacotron; Tacotron-2; Syllable Segmenter; Text Normalizer; RNN; LSTM; Griffin-Lim;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Myanmar is one of the developing countries situated in South-East Asia, and there are still many areas that have been under-developed with respect to advanced natural language processing technologies, where text-to-speech is one of them. The main motivation of this paper is to improve the naturalness of Myanmar text-to-speech system that is able to generate human-like speech. In this paper, we apply the neural network architecture based on Tacotron-2 that generates a mel spectrogram for speech synthesis directly from the sequence of text. Our proposed method is composed of three steps. In the first step, we create a speech corpus of 5k sentences of text and audio pair of Myanmar text from a large set of news articles, novel books, daily usages and travel-related expressions. We segment the Myanmar text into a sequence of characters by using a syllable segmenter and text normalizer. In the second step, we utilize the recurrent sequence-to-sequence feature prediction network that maps character embedding to mel-scale spectrograms. In the final step, we use Griffin-Lim algorithm to convert the corresponding text into generate Myanmar speech output. We compare our proposed method with an end-to-end generative model based on Tacotron. Furthermore, we investigate the subjective evaluation for both methods in speech synthesis by using mean opinion score (MOS). The experimental results show that our proposed method obtains an improvement over Tacotron based speech synthesis in terms of naturalness and intelligibility.
引用
收藏
页码:578 / 583
页数:6
相关论文
共 50 条
  • [1] Myanmar Text-to-Speech System based on Tacotron (End-to-End Generative Model)
    Win, Yuzana
    Lwin, Htoo Pyae
    Masada, Tomonari
    [J]. 11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 572 - 577
  • [2] A Prosodic Mandarin Text-to-Speech System Based on Tacotron
    Zhang, Chuxiong
    Zhang, Sheng
    Zhong, Haibing
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 165 - 169
  • [3] Limited text speech synthesis with electroglottograph based on Bi-LSTM and modified Tacotron-2
    Lijiang Chen
    Jie Ren
    Pengfei Chen
    Xia Mao
    Qi Zhao
    [J]. Applied Intelligence, 2022, 52 : 15193 - 15209
  • [4] Limited text speech synthesis with electroglottograph based on Bi-LSTM and modified Tacotron-2
    Chen, Lijiang
    Ren, Jie
    Chen, Pengfei
    Mao, Xia
    Zhao, Qi
    [J]. APPLIED INTELLIGENCE, 2022, 52 (13) : 15193 - 15209
  • [5] Lombard Speech Synthesis using Transfer Learning in a Tacotron Text-to-Speech System
    Bollepalli, Bajibabu
    Juvela, Lauri
    Alku, Paavo
    [J]. INTERSPEECH 2019, 2019, : 2833 - 2837
  • [6] Myanmar text-to-speech system with rule-based tone synthesis
    Win, Kyawt Yin
    Takara, Tomio
    [J]. ACOUSTICAL SCIENCE AND TECHNOLOGY, 2011, 32 (05) : 174 - 181
  • [7] The First Vietnamese FOSD-Tacotron-2-based Text-to-Speech Model Dataset
    Tran, Duc Chung
    [J]. DATA IN BRIEF, 2020, 31
  • [8] Myanmar Number Normalization for Text-to-Speech
    Hlaing, Aye Mya
    Pa, Win Pa
    Thu, Ye Kyaw
    [J]. COMPUTATIONAL LINGUISTICS, PACLING 2017, 2018, 781 : 263 - 274
  • [9] TACOTRON-BASED ACOUSTIC MODEL USING PHONEME ALIGNMENT FOR PRACTICAL NEURAL TEXT-TO-SPEECH SYSTEMS
    Okamoto, Takuma
    Toda, Tomoki
    Shiga, Yoshinori
    Kawai, Hisashi
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 214 - 221
  • [10] HMM Based Myanmar Text to Speech System
    Thu, Ye Kyaw
    Pa, Win Pa
    Ni, Jinfu
    Shiga, Yoshinori
    Finch, Andrew
    Hori, Chiori
    Kawai, Hisashi
    Sumita, Eiichiro
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2237 - 2241