RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis

被引：3

作者：

Zandie, Rohola ^{[1
,2
]}

Mahoor, Mohammad H. ^{[1
,2
]}

Madsen, Julia ^{[2
]}

Emamian, Eshrat S. ^{[2
]}

机构：

[1] Univ Denver, Dept Elect & Comp Engn, Denver, CO 80208 USA

[2] DreamFace Technol LLC, Denver, CO 80111 USA

来源：

INTERSPEECH 2021 | 2021年

基金：

美国国家卫生研究院;

关键词：

text to speech; speech corpus; speech recognition;

D O I：

10.21437/Interspeech.2021-341

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

This paper introduces RyanSpeech, a new speech corpus for research on automated text-to-speech (TTS) systems. Publicly available TTS corpora are often noisy, recorded with multiple speakers, or lack quality male speech data. In order to meet the need for a high quality, publicly available male speech corpus within the field of speech recognition, we have designed and created RyanSpeech which contains textual materials from real-world conversational settings. These materials contain over 10 hours of a professional male voice actor's speech recorded at 44.1 kHz. This corpus's design and pipeline make RyanSpeech ideal for developing TTS systems in real world applications. To provide a baseline for future research, protocols, and benchmarks, we trained 4 state-of-the-art speech models and a vocoder on RyanSpeech. The results show 3.36 in mean opinion scores (MOS) in our best model. We have made both the corpus and trained models for public use.

引用

页码：2751 / 2755

页数：5

共 50 条

[31] A prosodic model for text-to-speech synthesis in French
Di Cristo, A
Di Cristo, P
Campione, E
Véronis, J
[J]. INTONATION: ANALYSIS, MODELLING AND TECHNOLOGY, 2000, 15 : 321 - 355
[32] FACTORIZED CONTEXT MODELLING FOR TEXT-TO-SPEECH SYNTHESIS
Lu, Heng
King, Simon
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7849 - 7853
[33] A stochastic model of intonation for text-to-speech synthesis
Véronis, J
Di Cristo, P
Courtois, F
Chaumette, C
[J]. SPEECH COMMUNICATION, 1998, 26 (04) : 233 - 244
[34] Database processing for Spanish text-to-speech synthesis
Gómez-Mena, J
Cardo, M
Madrid, JL
Prades, C
[J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2000, 1902 : 248 - 252
[35] ASSIGNMENT OF SEGMENTAL DURATION IN TEXT-TO-SPEECH SYNTHESIS
VANSANTEN, JPH
[J]. COMPUTER SPEECH AND LANGUAGE, 1994, 8 (02): : 95 - 128
[36] Text-to-speech synthesis with an Indian language perspective
Panda, Soumya Priyadarsini
Nayak, Ajit Kumar
Patnaik, Srikanta
[J]. INTERNATIONAL JOURNAL OF GRID AND UTILITY COMPUTING, 2015, 6 (3-4) : 170 - 178
[37] Statistical Text-to-Speech Synthesis with Improved Dynamics
Tiomkin, Stas
Malah, David
[J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1841 - 1844
[38] Wavelet analysis used in text-to-speech synthesis
Kobayashi, M
Sakamoto, M
Saito, T
Hashimoto, Y
Nishimura, M
Suzuki, K
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-ANALOG AND DIGITAL SIGNAL PROCESSING, 1998, 45 (08): : 1125 - 1129
[39] A complete text-to-speech synthesis system in Tamil
Rama, GLJ
Ramakrishnan, AG
Muralishankar, R
Prathibha, R
[J]. PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 191 - 194
[40] Paraphrase generation to improve Text-To-Speech Synthesis
Putois, Ghislain
Chevelu, Jonathan
Boidin, Cedric
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 198 - 201

← 1 2 3 4 5 →