RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis

被引:3
|
作者
Zandie, Rohola [1 ,2 ]
Mahoor, Mohammad H. [1 ,2 ]
Madsen, Julia [2 ]
Emamian, Eshrat S. [2 ]
机构
[1] Univ Denver, Dept Elect & Comp Engn, Denver, CO 80208 USA
[2] DreamFace Technol LLC, Denver, CO 80111 USA
来源
基金
美国国家卫生研究院;
关键词
text to speech; speech corpus; speech recognition;
D O I
10.21437/Interspeech.2021-341
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
This paper introduces RyanSpeech, a new speech corpus for research on automated text-to-speech (TTS) systems. Publicly available TTS corpora are often noisy, recorded with multiple speakers, or lack quality male speech data. In order to meet the need for a high quality, publicly available male speech corpus within the field of speech recognition, we have designed and created RyanSpeech which contains textual materials from real-world conversational settings. These materials contain over 10 hours of a professional male voice actor's speech recorded at 44.1 kHz. This corpus's design and pipeline make RyanSpeech ideal for developing TTS systems in real world applications. To provide a baseline for future research, protocols, and benchmarks, we trained 4 state-of-the-art speech models and a vocoder on RyanSpeech. The results show 3.36 in mean opinion scores (MOS) in our best model. We have made both the corpus and trained models for public use.
引用
收藏
页码:2751 / 2755
页数:5
相关论文
共 50 条
  • [1] Strategies for developing a conversational speech dataset for Text-To-Speech Synthesis
    Adigwe, Adaeze O.
    Klabbers, Esther
    [J]. INTERSPEECH 2022, 2022, : 2318 - 2322
  • [2] Combining conversational speech with read speech to improve prosody in Text-to-Speech synthesis
    O'Mahony, Johannah
    Lai, Catherine
    King, Simon
    [J]. INTERSPEECH 2022, 2022, : 3388 - 3392
  • [3] On building phonetically and prosodically rich speech corpus for text-to-speech synthesis
    Matousek, Jindrich
    Romportl, Jan
    [J]. PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, : 442 - +
  • [4] Corpus-based Malay Text-to-Speech Synthesis System
    Swee, Tan Tian
    Salleh, Sheikh Hussain Shaikh
    [J]. 2008 14TH ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS, (APCC), VOLS 1 AND 2, 2008, : 52 - 56
  • [5] TEXT-TO-SPEECH SYNTHESIS
    SPROAT, RW
    OLIVE, JP
    [J]. AT&T TECHNICAL JOURNAL, 1995, 74 (02): : 35 - 44
  • [6] Design of a Yoruba Language Speech Corpus for the Purposes of Text-to-Speech (TTS) Synthesis
    Dagba, Theophile K.
    Aoga, John O. R.
    Fanou, Codjo C.
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2016, PT I, 2016, 9621 : 161 - 169
  • [7] SUST TTS Corpus: A phonetically-balanced corpus for Bangla text-to-speech synthesis
    Ahmad, Arif
    Selim, Md Reza
    Iqbal, Md Zafar
    Rahman, M. Shahidur
    [J]. ACOUSTICAL SCIENCE AND TECHNOLOGY, 2021, 42 (06) : 326 - 332
  • [8] IndicSpeech: Text-to-Speech Corpus for Indian Languages
    Srivastava, Nimisha
    Mukhopadhyay, Rudrabha
    Prajwal, K. R.
    Jawahar, C., V
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6417 - 6422
  • [9] Text and Speech Corpora for Text-To-Speech Synthesis of Tales
    Doukhan, David
    Rosset, Sophie
    Rilliard, Albert
    d'Alessandro, Christophe
    Adda-Decker, Martine
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1003 - 1010
  • [10] Multilingual text-to-speech synthesis
    Black, AW
    Lenzo, KA
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE AND MULTIDIMENSIONAL SIGNAL PROCESSING SPECIAL SESSIONS, 2004, : 761 - 764