RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis

被引：3

作者：

Zandie, Rohola ^{[1
,2
]}

Mahoor, Mohammad H. ^{[1
,2
]}

Madsen, Julia ^{[2
]}

Emamian, Eshrat S. ^{[2
]}

机构：

[1] Univ Denver, Dept Elect & Comp Engn, Denver, CO 80208 USA

[2] DreamFace Technol LLC, Denver, CO 80111 USA

来源：

INTERSPEECH 2021 | 2021年

基金：

美国国家卫生研究院;

关键词：

text to speech; speech corpus; speech recognition;

D O I：

10.21437/Interspeech.2021-341

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

This paper introduces RyanSpeech, a new speech corpus for research on automated text-to-speech (TTS) systems. Publicly available TTS corpora are often noisy, recorded with multiple speakers, or lack quality male speech data. In order to meet the need for a high quality, publicly available male speech corpus within the field of speech recognition, we have designed and created RyanSpeech which contains textual materials from real-world conversational settings. These materials contain over 10 hours of a professional male voice actor's speech recorded at 44.1 kHz. This corpus's design and pipeline make RyanSpeech ideal for developing TTS systems in real world applications. To provide a baseline for future research, protocols, and benchmarks, we trained 4 state-of-the-art speech models and a vocoder on RyanSpeech. The results show 3.36 in mean opinion scores (MOS) in our best model. We have made both the corpus and trained models for public use.

引用

页码：2751 / 2755

页数：5

共 50 条

[1] Strategies for developing a conversational speech dataset for Text-To-Speech Synthesis
Adigwe, Adaeze O.
Klabbers, Esther
[J]. INTERSPEECH 2022, 2022, : 2318 - 2322
[2] Combining conversational speech with read speech to improve prosody in Text-to-Speech synthesis
O'Mahony, Johannah
Lai, Catherine
King, Simon
[J]. INTERSPEECH 2022, 2022, : 3388 - 3392
[3] On building phonetically and prosodically rich speech corpus for text-to-speech synthesis
Matousek, Jindrich
Romportl, Jan
[J]. PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, : 442 - +
[4] Corpus-based Malay Text-to-Speech Synthesis System
Swee, Tan Tian
Salleh, Sheikh Hussain Shaikh
[J]. 2008 14TH ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS, (APCC), VOLS 1 AND 2, 2008, : 52 - 56
[5] TEXT-TO-SPEECH SYNTHESIS
SPROAT, RW
OLIVE, JP
[J]. AT&T TECHNICAL JOURNAL, 1995, 74 (02): : 35 - 44
[6] Design of a Yoruba Language Speech Corpus for the Purposes of Text-to-Speech (TTS) Synthesis
Dagba, Theophile K.
Aoga, John O. R.
Fanou, Codjo C.
[J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2016, PT I, 2016, 9621 : 161 - 169
[7] SUST TTS Corpus: A phonetically-balanced corpus for Bangla text-to-speech synthesis
Ahmad, Arif
Selim, Md Reza
Iqbal, Md Zafar
Rahman, M. Shahidur
[J]. ACOUSTICAL SCIENCE AND TECHNOLOGY, 2021, 42 (06) : 326 - 332
[8] IndicSpeech: Text-to-Speech Corpus for Indian Languages
Srivastava, Nimisha
Mukhopadhyay, Rudrabha
Prajwal, K. R.
Jawahar, C., V
[J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6417 - 6422
[9] Text and Speech Corpora for Text-To-Speech Synthesis of Tales
Doukhan, David
Rosset, Sophie
Rilliard, Albert
d'Alessandro, Christophe
Adda-Decker, Martine
[J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1003 - 1010
[10] Multilingual text-to-speech synthesis
Black, AW
Lenzo, KA
[J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE AND MULTIDIMENSIONAL SIGNAL PROCESSING SPECIAL SESSIONS, 2004, : 761 - 764

← 1 2 3 4 5 →