Myanmar Text-to-Speech System based on Tacotron-2

被引：0

作者：

Win, Yuzana ^{[1
]}

Masada, Tomonari ^{[2
]}

机构：

[1] Yangon Technol Univ, Dept CEIT, Yangon, Myanmar

[2] Rikkyo Univ, Artificial Intelligence & Sci, Tokyo, Japan

来源：

11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020) | 2020年

关键词：

Tacotron; Tacotron-2; Syllable Segmenter; Text Normalizer; RNN; LSTM; Griffin-Lim;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Myanmar is one of the developing countries situated in South-East Asia, and there are still many areas that have been under-developed with respect to advanced natural language processing technologies, where text-to-speech is one of them. The main motivation of this paper is to improve the naturalness of Myanmar text-to-speech system that is able to generate human-like speech. In this paper, we apply the neural network architecture based on Tacotron-2 that generates a mel spectrogram for speech synthesis directly from the sequence of text. Our proposed method is composed of three steps. In the first step, we create a speech corpus of 5k sentences of text and audio pair of Myanmar text from a large set of news articles, novel books, daily usages and travel-related expressions. We segment the Myanmar text into a sequence of characters by using a syllable segmenter and text normalizer. In the second step, we utilize the recurrent sequence-to-sequence feature prediction network that maps character embedding to mel-scale spectrograms. In the final step, we use Griffin-Lim algorithm to convert the corresponding text into generate Myanmar speech output. We compare our proposed method with an end-to-end generative model based on Tacotron. Furthermore, we investigate the subjective evaluation for both methods in speech synthesis by using mean opinion score (MOS). The experimental results show that our proposed method obtains an improvement over Tacotron based speech synthesis in terms of naturalness and intelligibility.

引用

页码：578 / 583

页数：6

共 50 条

[1] Myanmar Text-to-Speech System based on Tacotron (End-to-End Generative Model)
Win, Yuzana
Lwin, Htoo Pyae
Masada, Tomonari
[J]. 11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 572 - 577
[2] A Prosodic Mandarin Text-to-Speech System Based on Tacotron
Zhang, Chuxiong
Zhang, Sheng
Zhong, Haibing
[J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 165 - 169
[3] Limited text speech synthesis with electroglottograph based on Bi-LSTM and modified Tacotron-2
Lijiang Chen
Jie Ren
Pengfei Chen
Xia Mao
Qi Zhao
[J]. Applied Intelligence, 2022, 52 : 15193 - 15209
[4] Limited text speech synthesis with electroglottograph based on Bi-LSTM and modified Tacotron-2
Chen, Lijiang
Ren, Jie
Chen, Pengfei
Mao, Xia
Zhao, Qi
[J]. APPLIED INTELLIGENCE, 2022, 52 (13) : 15193 - 15209
[5] Lombard Speech Synthesis using Transfer Learning in a Tacotron Text-to-Speech System
Bollepalli, Bajibabu
Juvela, Lauri
Alku, Paavo
[J]. INTERSPEECH 2019, 2019, : 2833 - 2837
[6] Myanmar text-to-speech system with rule-based tone synthesis
Win, Kyawt Yin
Takara, Tomio
[J]. ACOUSTICAL SCIENCE AND TECHNOLOGY, 2011, 32 (05) : 174 - 181
[7] The First Vietnamese FOSD-Tacotron-2-based Text-to-Speech Model Dataset
Tran, Duc Chung
[J]. DATA IN BRIEF, 2020, 31
[8] Myanmar Number Normalization for Text-to-Speech
Hlaing, Aye Mya
Pa, Win Pa
Thu, Ye Kyaw
[J]. COMPUTATIONAL LINGUISTICS, PACLING 2017, 2018, 781 : 263 - 274
[9] TACOTRON-BASED ACOUSTIC MODEL USING PHONEME ALIGNMENT FOR PRACTICAL NEURAL TEXT-TO-SPEECH SYSTEMS
Okamoto, Takuma
Toda, Tomoki
Shiga, Yoshinori
Kawai, Hisashi
[J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 214 - 221
[10] HMM Based Myanmar Text to Speech System
Thu, Ye Kyaw
Pa, Win Pa
Ni, Jinfu
Shiga, Yoshinori
Finch, Andrew
Hori, Chiori
Kawai, Hisashi
Sumita, Eiichiro
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2237 - 2241

← 1 2 3 4 5 →