The First Vietnamese FOSD-Tacotron-2-based Text-to-Speech Model Dataset

被引:4
|
作者
Tran, Duc Chung [1 ]
机构
[1] FPT Univ, Comp Fundamental Dept, Hoa Lac Hi Tech Pk, Hanoi 155300, Vietnam
来源
DATA IN BRIEF | 2020年 / 31卷
关键词
Text-to-speech; Natural language processing; Natural language generation; Vietnamese; Speech; Dataset; Tacotron; WaveNet;
D O I
10.1016/j.dib.2020.105775
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Recent trends in voicebot application development have enabled utilization of both speech-to-text and text-to-speech (TTS) generation techniques. In order to generate a voice response to a given speech, one needs to use a TTS engine. The recently developed TTS engines are shifting towards end-to-end approaches utilizing models such as Tacotron, Tacotron2, WaveNet, and WaveGlow. The reason is that it enables a TTS service provider to focus on developing training and validating datasets comprising of labelled texts and recorded speeches instead of designing an entirely new model that outperforms the others which is time-consuming and costly. In this context, this work introduces the first Vietnamese FPT Open Speech Data (FOSD)-Tacotron-2-based TTS model dataset. This dataset comprises of a configuration file in *.json format; training and validating text input files (in *.csv format); a 225,0 0 0-step checkpoint of the trained model; and several sample generated audios. The published dataset is extremely worth for serving as a model for benchmarking with other newly developed TTS models / engines. In addition, it opens an entirely new TTS research optimization problem to be addressed: How to effectively generate speech from text given: a black box TTS (trained) model and its training and validation input texts. (C) 2020 The Author(s). Published by Elsevier Inc.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Myanmar Text-to-Speech System based on Tacotron-2
    Win, Yuzana
    Masada, Tomonari
    [J]. 11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 578 - 583
  • [2] A Prosodic Mandarin Text-to-Speech System Based on Tacotron
    Zhang, Chuxiong
    Zhang, Sheng
    Zhong, Haibing
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 165 - 169
  • [3] Myanmar Text-to-Speech System based on Tacotron (End-to-End Generative Model)
    Win, Yuzana
    Lwin, Htoo Pyae
    Masada, Tomonari
    [J]. 11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 572 - 577
  • [4] REYD - The First Yiddish Text-to-Speech Dataset and System
    Webber, Jacob J.
    Lo, Samuel K.
    Bleaman, Isaac L.
    [J]. INTERSPEECH 2022, 2022, : 2363 - 2367
  • [5] Prosodic Boundary Prediction Model for Vietnamese Text-To-Speech
    Nguyen Thi Thu Trang
    Nguyen Hoang Ky
    Rilliard, Albert
    d'Alessandro, Christophe
    [J]. INTERSPEECH 2021, 2021, : 3885 - 3889
  • [6] Prosodic boundary prediction model for Vietnamese text-to-speech
    Trang, Nguyen Thi Thu
    Ky, Nguyen Hoang
    Rilliard, Albert
    D'Alessandro, Christophe
    [J]. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2021, 5 : 3366 - 3370
  • [7] TACOTRON-BASED ACOUSTIC MODEL USING PHONEME ALIGNMENT FOR PRACTICAL NEURAL TEXT-TO-SPEECH SYSTEMS
    Okamoto, Takuma
    Toda, Tomoki
    Shiga, Yoshinori
    Kawai, Hisashi
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 214 - 221
  • [8] Lombard Speech Synthesis using Transfer Learning in a Tacotron Text-to-Speech System
    Bollepalli, Bajibabu
    Juvela, Lauri
    Alku, Paavo
    [J]. INTERSPEECH 2019, 2019, : 2833 - 2837
  • [9] EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model
    Cui, Chenye
    Ren, Yi
    Liu, Jinglin
    Chen, Feiyang
    Huang, Rongjie
    Lei, Ming
    Zhao, Zhou
    [J]. INTERSPEECH 2021, 2021, : 2766 - 2770
  • [10] Precise tone generation for Vietnamese text-to-speech system
    Do, TT
    Takar, T
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 504 - 507