Myanmar Text-to-Speech System based on Tacotron-2

被引：0

作者：

Win, Yuzana ^{[1
]}

Masada, Tomonari ^{[2
]}

机构：

[1] Yangon Technol Univ, Dept CEIT, Yangon, Myanmar

[2] Rikkyo Univ, Artificial Intelligence & Sci, Tokyo, Japan

来源：

11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020) | 2020年

关键词：

Tacotron; Tacotron-2; Syllable Segmenter; Text Normalizer; RNN; LSTM; Griffin-Lim;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Myanmar is one of the developing countries situated in South-East Asia, and there are still many areas that have been under-developed with respect to advanced natural language processing technologies, where text-to-speech is one of them. The main motivation of this paper is to improve the naturalness of Myanmar text-to-speech system that is able to generate human-like speech. In this paper, we apply the neural network architecture based on Tacotron-2 that generates a mel spectrogram for speech synthesis directly from the sequence of text. Our proposed method is composed of three steps. In the first step, we create a speech corpus of 5k sentences of text and audio pair of Myanmar text from a large set of news articles, novel books, daily usages and travel-related expressions. We segment the Myanmar text into a sequence of characters by using a syllable segmenter and text normalizer. In the second step, we utilize the recurrent sequence-to-sequence feature prediction network that maps character embedding to mel-scale spectrograms. In the final step, we use Griffin-Lim algorithm to convert the corresponding text into generate Myanmar speech output. We compare our proposed method with an end-to-end generative model based on Tacotron. Furthermore, we investigate the subjective evaluation for both methods in speech synthesis by using mean opinion score (MOS). The experimental results show that our proposed method obtains an improvement over Tacotron based speech synthesis in terms of naturalness and intelligibility.

引用

页码：578 / 583

页数：6

共 50 条

[31] Corpus-based Malay Text-to-Speech Synthesis System
Swee, Tan Tian
Salleh, Sheikh Hussain Shaikh
[J]. 2008 14TH ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS, (APCC), VOLS 1 AND 2, 2008, : 52 - 56
[32] An HMM-based Mandarin Chinese Text-to-Speech system
Qian, Yao
Soong, Frank
Chen, Yining
Chu, Min
[J]. CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 223 - +
[33] An Isarn Dialect HMM-based Text-to-speech System
Janyoi, Pongsathon
Seresangtakul, Pusadee
[J]. 2017 2ND INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY (INCIT), 2017, : 1 - 6
[34] Part of Speech Tagging for Romanian Text-to-Speech System
Teodorescu, Lucian Radu
Boldizsar, Razvan
Ordean, Mihai
Duma, Melania
Detesan, Laura
Ordean, Mihaela
[J]. 13TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2011), 2012, : 153 - 159
[35] INVESTIGATION OF ENHANCED TACOTRON TEXT-TO-SPEECH SYNTHESIS SYSTEMS WITH SELF-ATTENTION FOR PITCH ACCENT LANGUAGE
Yasuda, Yusuke
Wang, Xin
Takaki, Shinji
Yamagishi, Junichi
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6905 - 6909
[36] A new Korean corpus-based text-to-speech system
Kim S.
Lee Y.
Hirose K.
[J]. International Journal of Speech Technology, 2002, 5 (2) : 105 - 116
[37] A stochastic knowledge-based Thai text-to-speech system
Narupiyakul, L
Khumya, A
Sirinaovakul, B
Cercone, N
[J]. MATHEMATICAL AND COMPUTER MODELLING, 2005, 42 (1-2) : 1 - 16
[38] An advanced text-to-speech server system based on soap protocol
Xu, YY
Tang, H
Zhang, PR
[J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 728 - 731
[39] Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet
Zhang, Mingyang
Wang, Xin
Fang, Fuming
Li, Haizhou
Yamagishi, Junichi
[J]. INTERSPEECH 2019, 2019, : 1298 - 1302
[40] Developing a Child Friendly Text-to-Speech System
Jacob, Agnes
Mythili, P.
[J]. ADVANCES IN HUMAN-COMPUTER INTERACTION, 2008, 2008

← 1 2 3 4 5 →