The First Vietnamese FOSD-Tacotron-2-based Text-to-Speech Model Dataset

被引:4
|
作者
Tran, Duc Chung [1 ]
机构
[1] FPT Univ, Comp Fundamental Dept, Hoa Lac Hi Tech Pk, Hanoi 155300, Vietnam
来源
DATA IN BRIEF | 2020年 / 31卷
关键词
Text-to-speech; Natural language processing; Natural language generation; Vietnamese; Speech; Dataset; Tacotron; WaveNet;
D O I
10.1016/j.dib.2020.105775
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Recent trends in voicebot application development have enabled utilization of both speech-to-text and text-to-speech (TTS) generation techniques. In order to generate a voice response to a given speech, one needs to use a TTS engine. The recently developed TTS engines are shifting towards end-to-end approaches utilizing models such as Tacotron, Tacotron2, WaveNet, and WaveGlow. The reason is that it enables a TTS service provider to focus on developing training and validating datasets comprising of labelled texts and recorded speeches instead of designing an entirely new model that outperforms the others which is time-consuming and costly. In this context, this work introduces the first Vietnamese FPT Open Speech Data (FOSD)-Tacotron-2-based TTS model dataset. This dataset comprises of a configuration file in *.json format; training and validating text input files (in *.csv format); a 225,0 0 0-step checkpoint of the trained model; and several sample generated audios. The published dataset is extremely worth for serving as a model for benchmarking with other newly developed TTS models / engines. In addition, it opens an entirely new TTS research optimization problem to be addressed: How to effectively generate speech from text given: a black box TTS (trained) model and its training and validation input texts. (C) 2020 The Author(s). Published by Elsevier Inc.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] An efficient model for text-to-speech synthesis in Indian languages
    Panda, Soumya Priyadarsini
    Nayak, Ajit Kumar
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2015, 18 (03) : 305 - 315
  • [42] Gigant-KTTS dataset: Towards building an extensive gigant dataset for Kurdish text-to-speech systems
    Ahmad, Hawraz A.
    Rashid, Tarik A.
    [J]. DATA IN BRIEF, 2024, 55
  • [43] A superposed prosodic model for Chinese text-to-speech synthesis
    Chen, GP
    Bailly, G
    Liu, QF
    Wang, RH
    [J]. 2004 International Symposium on Chinese Spoken Language Processing, Proceedings, 2004, : 177 - 180
  • [44] FUJISAKI INTONATION MODEL IN TURKISH TEXT-TO-SPEECH SYNTHESIS
    Uslu, Baran
    Ilk, H. Goekhan
    [J]. 2009 IEEE 17TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2009, : 133 - 136
  • [45] SELECTION OF A FORMANT SYNTHESIZER MODEL FOR TEXT-TO-SPEECH SYNTHESIS
    SINCLAIR, DA
    [J]. PROCEEDINGS : INSTITUTE OF ACOUSTICS, VOL 8, PART 7: SPEECH & HEARING, 1986, 8 : 363 - 369
  • [46] Improvements of Hungarian Hidden Markov Model-based Text-to-Speech Synthesis
    Toth, Balint
    Nemeth, Geza
    [J]. ACTA CYBERNETICA, 2010, 19 (04): : 715 - 731
  • [47] Model architectures to extrapolate emotional expressions in DNN-based text-to-speech
    Inoue, Katsuki
    Hara, Sunao
    Abe, Masanobu
    Hojo, Nobukatsu
    Ijima, Yusuke
    [J]. SPEECH COMMUNICATION, 2021, 126 : 35 - 43
  • [48] A tree-based model of prosodic phrasing for Chinese text-to-speech systems
    Chen, WJ
    Lin, FZ
    Li, JM
    Zhang, B
    [J]. ADVANCES IN MUTLIMEDIA INFORMATION PROCESSING - PCM 2001, PROCEEDINGS, 2001, 2195 : 1054 - 1059
  • [49] Limited text speech synthesis with electroglottograph based on Bi-LSTM and modified Tacotron-2
    Lijiang Chen
    Jie Ren
    Pengfei Chen
    Xia Mao
    Qi Zhao
    [J]. Applied Intelligence, 2022, 52 : 15193 - 15209
  • [50] Speech Synthesis Method Based on Tacotron2
    Li, Yang
    Qin, DongHong
    Zhang, JinBo
    [J]. 2021 13TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2021, : 94 - 99