Strategies for developing a conversational speech dataset for Text-To-Speech Synthesis

被引:0
|
作者
Adigwe, Adaeze O. [1 ,2 ]
Klabbers, Esther [2 ]
机构
[1] Univ Helsinki, Helsinki, Finland
[2] ReadSpeaker, Driebergen Rijsenburg, Netherlands
来源
基金
欧盟地平线“2020”;
关键词
conversational text-to-speech; speaking styles; prosody; speech corpus;
D O I
10.21437/Interspeech.2022-10802
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
There have been many efforts to improve the quality of speech synthesis systems in conversational AI. Although state-of-the-art systems are capable of producing natural-sounding speech, the generated speech often lacks prosodic variation and is not always suited to the task. In this paper, we examine dialogue data collection methods to use as training data for our acoustic models. We collect speech using three different setups: (1) Random read-aloud sentences; (2) Performed dialogues; (3) Semi-Spontaneous dialogues. We analyze prosodic and textual properties of the data collected in these setups and make some recommendations to collect data for speech synthesis in conversational AI settings.
引用
收藏
页码:2318 / 2322
页数:5
相关论文
共 50 条
  • [1] RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis
    Zandie, Rohola
    Mahoor, Mohammad H.
    Madsen, Julia
    Emamian, Eshrat S.
    [J]. INTERSPEECH 2021, 2021, : 2751 - 2755
  • [2] Combining conversational speech with read speech to improve prosody in Text-to-Speech synthesis
    O'Mahony, Johannah
    Lai, Catherine
    King, Simon
    [J]. INTERSPEECH 2022, 2022, : 3388 - 3392
  • [3] TEXT-TO-SPEECH SYNTHESIS
    SPROAT, RW
    OLIVE, JP
    [J]. AT&T TECHNICAL JOURNAL, 1995, 74 (02): : 35 - 44
  • [4] Text and Speech Corpora for Text-To-Speech Synthesis of Tales
    Doukhan, David
    Rosset, Sophie
    Rilliard, Albert
    d'Alessandro, Christophe
    Adda-Decker, Martine
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1003 - 1010
  • [5] Multilingual text-to-speech synthesis
    Black, AW
    Lenzo, KA
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE AND MULTIDIMENSIONAL SIGNAL PROCESSING SPECIAL SESSIONS, 2004, : 761 - 764
  • [6] Improving text-to-speech synthesis
    Tatham, M
    Lewis, E
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1856 - 1859
  • [7] An introduction to text-to-speech synthesis
    Fitzpatrick, E
    [J]. COMPUTATIONAL LINGUISTICS, 1998, 24 (02) : 322 - 323
  • [8] Issues in text-to-speech synthesis
    Macchi, M
    [J]. IEEE INTERNATIONAL JOINT SYMPOSIA ON INTELLIGENCE AND SYSTEMS - PROCEEDINGS, 1998, : 318 - 325
  • [9] KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset
    Mussakhojayeva, Saida
    Janaliyeva, Aigerim
    Mirzakhmetov, Almas
    Khassanov, Yerbolat
    Varol, Huseyin Atakan
    [J]. INTERSPEECH 2021, 2021, : 2786 - 2790
  • [10] CLUSTERING OF DURATION PATTERNS IN SPEECH FOR TEXT-TO-SPEECH SYNTHESIS
    Sreelekshmi, K. S.
    Gopinath, Deepa P.
    [J]. 2012 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2012, : 1122 - 1127