RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis

被引:3
|
作者
Zandie, Rohola [1 ,2 ]
Mahoor, Mohammad H. [1 ,2 ]
Madsen, Julia [2 ]
Emamian, Eshrat S. [2 ]
机构
[1] Univ Denver, Dept Elect & Comp Engn, Denver, CO 80208 USA
[2] DreamFace Technol LLC, Denver, CO 80111 USA
来源
基金
美国国家卫生研究院;
关键词
text to speech; speech corpus; speech recognition;
D O I
10.21437/Interspeech.2021-341
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
This paper introduces RyanSpeech, a new speech corpus for research on automated text-to-speech (TTS) systems. Publicly available TTS corpora are often noisy, recorded with multiple speakers, or lack quality male speech data. In order to meet the need for a high quality, publicly available male speech corpus within the field of speech recognition, we have designed and created RyanSpeech which contains textual materials from real-world conversational settings. These materials contain over 10 hours of a professional male voice actor's speech recorded at 44.1 kHz. This corpus's design and pipeline make RyanSpeech ideal for developing TTS systems in real world applications. To provide a baseline for future research, protocols, and benchmarks, we trained 4 state-of-the-art speech models and a vocoder on RyanSpeech. The results show 3.36 in mean opinion scores (MOS) in our best model. We have made both the corpus and trained models for public use.
引用
收藏
页码:2751 / 2755
页数:5
相关论文
共 50 条
  • [31] A prosodic model for text-to-speech synthesis in French
    Di Cristo, A
    Di Cristo, P
    Campione, E
    Véronis, J
    [J]. INTONATION: ANALYSIS, MODELLING AND TECHNOLOGY, 2000, 15 : 321 - 355
  • [32] FACTORIZED CONTEXT MODELLING FOR TEXT-TO-SPEECH SYNTHESIS
    Lu, Heng
    King, Simon
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7849 - 7853
  • [33] A stochastic model of intonation for text-to-speech synthesis
    Véronis, J
    Di Cristo, P
    Courtois, F
    Chaumette, C
    [J]. SPEECH COMMUNICATION, 1998, 26 (04) : 233 - 244
  • [34] Database processing for Spanish text-to-speech synthesis
    Gómez-Mena, J
    Cardo, M
    Madrid, JL
    Prades, C
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2000, 1902 : 248 - 252
  • [35] ASSIGNMENT OF SEGMENTAL DURATION IN TEXT-TO-SPEECH SYNTHESIS
    VANSANTEN, JPH
    [J]. COMPUTER SPEECH AND LANGUAGE, 1994, 8 (02): : 95 - 128
  • [36] Text-to-speech synthesis with an Indian language perspective
    Panda, Soumya Priyadarsini
    Nayak, Ajit Kumar
    Patnaik, Srikanta
    [J]. INTERNATIONAL JOURNAL OF GRID AND UTILITY COMPUTING, 2015, 6 (3-4) : 170 - 178
  • [37] Statistical Text-to-Speech Synthesis with Improved Dynamics
    Tiomkin, Stas
    Malah, David
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1841 - 1844
  • [38] Wavelet analysis used in text-to-speech synthesis
    Kobayashi, M
    Sakamoto, M
    Saito, T
    Hashimoto, Y
    Nishimura, M
    Suzuki, K
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-ANALOG AND DIGITAL SIGNAL PROCESSING, 1998, 45 (08): : 1125 - 1129
  • [39] A complete text-to-speech synthesis system in Tamil
    Rama, GLJ
    Ramakrishnan, AG
    Muralishankar, R
    Prathibha, R
    [J]. PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 191 - 194
  • [40] Paraphrase generation to improve Text-To-Speech Synthesis
    Putois, Ghislain
    Chevelu, Jonathan
    Boidin, Cedric
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 198 - 201