Emotional Speech Synthesis for Multi-Speaker Emotional Dataset Using WaveNet Vocoder

被引:0
|
作者
Choi, Heejin [1 ]
Park, Sangjun [1 ]
Park, Jinuk [1 ]
Hahn, Minsoo [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Daejeon, South Korea
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper studies the methods for emotional speech synthesis using a neural vocoder. For a neural vocoder, WaveNet is used, which generates waveforms from mel spectrograms. We propose two networks, i.e., deep convolutional neural network (CNN)-based text-to-speech (TTS) system and emotional converter, and deep CNN architecture is designed as to utilize long-term context information. The first network estimates neutral mel spectrograms using linguistic features, and the second network converts neutral mel spectrograms to emotional mel spectrograms. Experimental results on a TTS system and emotional TTS system, showed that the proposed systems are indeed a promising approach.
引用
收藏
页数:2
相关论文
共 50 条
  • [41] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION
    Settle, Shane
    Le Roux, Jonathan
    Hori, Takaaki
    Watanabe, Shinji
    Hershey, John R.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4819 - 4823
  • [42] End-to-End Multi-Speaker Speech Recognition using Speaker Embeddings and Transfer Learning
    Denisov, Pavel
    Ngoc Thang Vu
    [J]. INTERSPEECH 2019, 2019, : 4425 - 4429
  • [43] Zero-Shot Normalization Driven Multi-Speaker Text to Speech Synthesis
    Kumar, Neeraj
    Narang, Ankur
    Lall, Brejesh
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1679 - 1693
  • [44] TOWARDS MULTI-SPEAKER UNSUPERVISED SPEECH PATTERN DISCOVERY
    Zhang, Yaodong
    Glass, James R.
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4366 - 4369
  • [45] A Canadian French Emotional Speech Dataset
    Gournay, Philippe
    Lahaie, Olivier
    Lefebvre, Roch
    [J]. PROCEEDINGS OF THE 9TH ACM MULTIMEDIA SYSTEMS CONFERENCE (MMSYS'18), 2018, : 399 - 402
  • [46] Development of Japanese Paralinguistic Information Transmission Tests for Assessment of Hearing-assistance Devices Utilizing Multi-Speaker and Emotional Speech Corpora
    Kagomiya, Takayuki
    Nakagawa, Seiji
    [J]. 2014 17TH ORIENTAL CHAPTER OF THE INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDIZATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (COCOSDA), 2014,
  • [47] MULTI-SPEAKER, NARROWBAND, CONTINUOUS MARATHI SPEECH DATABASE
    Godambe, Tejas
    Bondale, Nandini
    Samudravijaya, K.
    Rao, Preeti
    [J]. 2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,
  • [48] Speech Recognition and Multi-Speaker Diarization of Long Conversations
    Mao, Huanru Henry
    Li, Shuyang
    McAuley, Julian
    Cottrell, Garrison W.
    [J]. INTERSPEECH 2020, 2020, : 691 - 695
  • [49] An objective evaluation of the effects of recording conditions and speaker characteristics in multi-speaker deep neural speech synthesis
    Lorincz, Beata
    Stan, Adriana
    Giurgiu, Mircea
    [J]. KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KSE 2021), 2021, 192 : 756 - 765
  • [50] Morphological and Acoustic Analysis of the Vocal Tract Using a Multi-Speaker Volumetric MRI Dataset
    Kaburagi, Tokihiko
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 379 - 383