Probabilistic Amplitude Demodulation Features in Speech Synthesis for Improving Prosody

被引:0
|
作者
Lazaridis, Alexandros [1 ]
Cernak, Milos [1 ]
Garner, Philip N. [1 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
基金
瑞士国家科学基金会;
关键词
Probabilistic amplitude demodulation; speech synthesis; deep neural networks; speech prosody;
D O I
10.21437/Interspeech.2016-258
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Amplitude demodulation (AM) is a signal decomposition technique by which a signal can be decomposed to a product of two signals, i.e, a quickly varying carrier and a slowly varying modulator. In this work, the probabilistic amplitude demodulation (PAD) features are used to improve prosody in speech synthesis. The PAD is applied iteratively for generating syllable and stress amplitude modulations in a cascade manner. The PAD features are used as a secondary input scheme along with the standard text-based input features in statistical parametric speech synthesis. Specifically, deep neural network (DNN)-based speech synthesis is used to evaluate the importance of these features. Objective evaluation has shown that the proposed system using the PAD features has improved mainly prosody modelling; it outperforms the baseline system by approximately 5% in terms of relative reduction in root mean square error (RMSE) of the fundamental frequency (FO). The significance of this improvement is validated by subjective evaluation of the overall speech quality, achieving 38.6% over 19.5% preference score in respect to the baseline system, in an ABX test.
引用
收藏
页码:2298 / 2302
页数:5
相关论文
共 50 条
  • [11] Prosody modelling of Spanish for expressive speech synthesis
    Iriondo, Ignasi
    Socoro, Joan Claudi
    Alias, Francesc
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 821 - +
  • [12] Speech Recognition with Word Fragment Detection Using Prosody Features for Spontaneous Speech
    Yeh, Jui-Feng
    Yen, Ming-Chi
    APPLIED MATHEMATICS & INFORMATION SCIENCES, 2012, 6 (02): : 669S - 675S
  • [13] Prosody analysis and modeling for emotional speech synthesis
    Jiang, DN
    Zhang, W
    Shen, LQ
    Cai, LH
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 281 - 284
  • [14] Discourse Prosody and Its Application to Speech Synthesis
    Hu, Na
    Shao, Pengfei
    Zu, Yiqing
    Wang, Zuyan
    Huang, Wei
    Wang, Shijin
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [15] Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model
    Li, Xiang
    Liu, Songxiang
    Lam, Max W. Y.
    Wu, Zhiyong
    Weng, Chao
    Meng, Helen
    INTERSPEECH 2023, 2023, : 4858 - 4862
  • [16] Estimating Mutual Information in Prosody Representation for Emotional Prosody Transfer in Speech Synthesis
    Zhang, Guangyan
    Qiu, Shirong
    Qin, Ying
    Lee, Tan
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [17] IMPROVING PROSODY MODELLING WITH CROSS-UTTERANCE BERT EMBEDDINGS FOR END-TO-END SPEECH SYNTHESIS
    Xii, Guanghui
    Song, Wei
    Zhang, Zhengchen
    Zhang, Chao
    He, Xiaodong
    Zhou, Bowen
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6079 - 6083
  • [18] STATISTICAL INFERENCE FOR SINGLE- AND MULTI-BAND PROBABILISTIC AMPLITUDE DEMODULATION
    Turner, Richard E.
    Sahani, Maneesh
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5466 - 5469
  • [19] Improving the Prosody of RNN-based English Text-To-Speech Synthesis by Incorporating a BERT model
    Kenter, Tom
    Sharma, Manish
    Clark, Rob
    INTERSPEECH 2020, 2020, : 4412 - 4416
  • [20] Synthesis of emotional speech by prosody modification of vowel segments of neutral speech
    Fahad M.S.
    Singh S.
    Gupta S.
    Deepak A.
    Abhinav
    Recent Advances in Computer Science and Communications, 2021, 14 (04) : 1226 - 1235