Myanmar Text-to-Speech System based on Tacotron-2

被引:0
|
作者
Win, Yuzana [1 ]
Masada, Tomonari [2 ]
机构
[1] Yangon Technol Univ, Dept CEIT, Yangon, Myanmar
[2] Rikkyo Univ, Artificial Intelligence & Sci, Tokyo, Japan
关键词
Tacotron; Tacotron-2; Syllable Segmenter; Text Normalizer; RNN; LSTM; Griffin-Lim;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Myanmar is one of the developing countries situated in South-East Asia, and there are still many areas that have been under-developed with respect to advanced natural language processing technologies, where text-to-speech is one of them. The main motivation of this paper is to improve the naturalness of Myanmar text-to-speech system that is able to generate human-like speech. In this paper, we apply the neural network architecture based on Tacotron-2 that generates a mel spectrogram for speech synthesis directly from the sequence of text. Our proposed method is composed of three steps. In the first step, we create a speech corpus of 5k sentences of text and audio pair of Myanmar text from a large set of news articles, novel books, daily usages and travel-related expressions. We segment the Myanmar text into a sequence of characters by using a syllable segmenter and text normalizer. In the second step, we utilize the recurrent sequence-to-sequence feature prediction network that maps character embedding to mel-scale spectrograms. In the final step, we use Griffin-Lim algorithm to convert the corresponding text into generate Myanmar speech output. We compare our proposed method with an end-to-end generative model based on Tacotron. Furthermore, we investigate the subjective evaluation for both methods in speech synthesis by using mean opinion score (MOS). The experimental results show that our proposed method obtains an improvement over Tacotron based speech synthesis in terms of naturalness and intelligibility.
引用
收藏
页码:578 / 583
页数:6
相关论文
共 50 条
  • [31] Corpus-based Malay Text-to-Speech Synthesis System
    Swee, Tan Tian
    Salleh, Sheikh Hussain Shaikh
    [J]. 2008 14TH ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS, (APCC), VOLS 1 AND 2, 2008, : 52 - 56
  • [32] An HMM-based Mandarin Chinese Text-to-Speech system
    Qian, Yao
    Soong, Frank
    Chen, Yining
    Chu, Min
    [J]. CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 223 - +
  • [33] An Isarn Dialect HMM-based Text-to-speech System
    Janyoi, Pongsathon
    Seresangtakul, Pusadee
    [J]. 2017 2ND INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY (INCIT), 2017, : 1 - 6
  • [34] Part of Speech Tagging for Romanian Text-to-Speech System
    Teodorescu, Lucian Radu
    Boldizsar, Razvan
    Ordean, Mihai
    Duma, Melania
    Detesan, Laura
    Ordean, Mihaela
    [J]. 13TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2011), 2012, : 153 - 159
  • [35] INVESTIGATION OF ENHANCED TACOTRON TEXT-TO-SPEECH SYNTHESIS SYSTEMS WITH SELF-ATTENTION FOR PITCH ACCENT LANGUAGE
    Yasuda, Yusuke
    Wang, Xin
    Takaki, Shinji
    Yamagishi, Junichi
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6905 - 6909
  • [36] A new Korean corpus-based text-to-speech system
    Kim S.
    Lee Y.
    Hirose K.
    [J]. International Journal of Speech Technology, 2002, 5 (2) : 105 - 116
  • [37] A stochastic knowledge-based Thai text-to-speech system
    Narupiyakul, L
    Khumya, A
    Sirinaovakul, B
    Cercone, N
    [J]. MATHEMATICAL AND COMPUTER MODELLING, 2005, 42 (1-2) : 1 - 16
  • [38] An advanced text-to-speech server system based on soap protocol
    Xu, YY
    Tang, H
    Zhang, PR
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 728 - 731
  • [39] Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet
    Zhang, Mingyang
    Wang, Xin
    Fang, Fuming
    Li, Haizhou
    Yamagishi, Junichi
    [J]. INTERSPEECH 2019, 2019, : 1298 - 1302
  • [40] Developing a Child Friendly Text-to-Speech System
    Jacob, Agnes
    Mythili, P.
    [J]. ADVANCES IN HUMAN-COMPUTER INTERACTION, 2008, 2008