Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder

被引:0
|
作者
Akuzawa, Kei [1 ]
Iwasawa, Yusuke [1 ]
Matsuo, Yutaka [1 ]
机构
[1] Univ Tokyo, Grad Sch Engn, Tokyo, Japan
关键词
autoregressive model; variational autoencoder; expressive speech synthesis;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advances in neural autoregressive models have improve the performance of speech synthesis (SS). However, as they lack the ability to model global characteristics of speech (such as speaker individualities or speaking styles), particularly when these characteristics have not been labeled, making neural autoregressive SS systems more expressive is still an open issue. In this paper, we propose to combine VoiceLoop, an autoregressive SS model, with Variational Autoencoder (VAE). This approach, unlike traditional autoregressive SS systems, uses VAE to model the global characteristics explicitly, enabling the expressiveness of the synthesized speech to be controlled in an unsupervised manner. Experiments using the VCTK and Blizzard2012 datasets show the VAE helps VoiceLoop to generate higher quality speech and to control the experssions in its synthesized speech by incorporating global characteristics into the speech generating process.
引用
收藏
页码:3067 / 3071
页数:5
相关论文
共 50 条
  • [1] DISCOURSE-LEVEL PROSODY MODELING WITH A VARIATIONAL AUTOENCODER FOR NON-AUTOREGRESSIVE EXPRESSIVE SPEECH SYNTHESIS
    Wu, Ning-Qian
    Liu, Zhao-Ci
    Ling, Zhen-Hua
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7592 - 7596
  • [2] Towards Expressive Speech Synthesis: Analysis and Modeling of Expressive Speech
    Raptis, Spyros
    Karabetsos, Sotiris
    Chalamandaris, Aimilios
    Tsiakoulis, Pirros
    [J]. 2014 5th IEEE Conference on Cognitive Infocommunications (CogInfoCom), 2014, : 461 - 465
  • [3] A RECURRENT VARIATIONAL AUTOENCODER FOR SPEECH ENHANCEMENT
    Leglaive, Simon
    Alameda-Pineda, Xavier
    Girin, Laurent
    Horaud, Radu
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 371 - 375
  • [4] Efficient and expressive high-resolution image synthesis via variational autoencoder-enriched transformers with sparse attention mechanisms
    Tang, Bingyin
    Feng, Fan
    [J]. JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (03)
  • [5] Reconstructing Neutral Face Expressions with Disentangled Variational Autoencoder
    Wiem, Grina
    Ali, Douik
    [J]. ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT II, 2024, 14496 : 83 - 94
  • [6] Speech Enhancement Using Dynamical Variational AutoEncoder
    Do, Hao D.
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2023, PT II, 2023, 13996 : 247 - 258
  • [7] A Disentangled Recurrent Variational Autoencoder for Speech Enhancement
    Yan, Hegen
    Lu, Zhihua
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1697 - 1702
  • [8] Bimodal variational autoencoder for audiovisual speech recognition
    Hadeer M. Sayed
    Hesham E. ElDeeb
    Shereen A. Taie
    [J]. Machine Learning, 2023, 112 : 1201 - 1226
  • [9] Bimodal variational autoencoder for audiovisual speech recognition
    Sayed, Hadeer M.
    ElDeeb, Hesham E.
    Taie, Shereen A.
    [J]. MACHINE LEARNING, 2023, 112 (04) : 1201 - 1226
  • [10] Laughter synthesis: A comparison between Variational autoencoder and Autoencoder
    Mansouri, Nadia
    Lachiri, Zied
    [J]. 2020 5TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP'2020), 2020,