Probabilistic Amplitude Demodulation Features in Speech Synthesis for Improving Prosody

被引:0
|
作者
Lazaridis, Alexandros [1 ]
Cernak, Milos [1 ]
Garner, Philip N. [1 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
基金
瑞士国家科学基金会;
关键词
Probabilistic amplitude demodulation; speech synthesis; deep neural networks; speech prosody;
D O I
10.21437/Interspeech.2016-258
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Amplitude demodulation (AM) is a signal decomposition technique by which a signal can be decomposed to a product of two signals, i.e, a quickly varying carrier and a slowly varying modulator. In this work, the probabilistic amplitude demodulation (PAD) features are used to improve prosody in speech synthesis. The PAD is applied iteratively for generating syllable and stress amplitude modulations in a cascade manner. The PAD features are used as a secondary input scheme along with the standard text-based input features in statistical parametric speech synthesis. Specifically, deep neural network (DNN)-based speech synthesis is used to evaluate the importance of these features. Objective evaluation has shown that the proposed system using the PAD features has improved mainly prosody modelling; it outperforms the baseline system by approximately 5% in terms of relative reduction in root mean square error (RMSE) of the fundamental frequency (FO). The significance of this improvement is validated by subjective evaluation of the overall speech quality, achieving 38.6% over 19.5% preference score in respect to the baseline system, in an ABX test.
引用
收藏
页码:2298 / 2302
页数:5
相关论文
共 50 条
  • [41] Diction based prosody modeling in table-to-speech synthesis
    Spiliotopoulos, D
    Xydas, G
    Kouroupetroglou, G
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2005, 3658 : 294 - 301
  • [42] ProZed: A speech prosody analysis-by-synthesis tool for linguists
    Hirst, Daniel
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON SPEECH PROSODY, VOLS I AND II, 2012, : 15 - 18
  • [43] Prosody evaluation for embedded slovene speech-synthesis systems
    Mihelic, France
    Vesnicer, Bostjan
    Zibert, Janez
    Noeth, Elmar
    INFORMACIJE MIDEM-JOURNAL OF MICROELECTRONICS ELECTRONIC COMPONENTS AND MATERIALS, 2007, 37 (03): : 176 - 181
  • [44] Affective synthesis and animation of arm gestures from speech prosody
    Bozkurt, Elif
    Yemez, Yucel
    Erzin, Engin
    SPEECH COMMUNICATION, 2020, 119 : 1 - 11
  • [45] Amplitude Modulation Features for Emotion Recognition from Speech
    Alam, Md Jahangir
    Attabi, Yazid
    Dumouchel, Pierre
    Kenny, Patrick
    O'Shaughnessy, D.
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2419 - 2423
  • [46] Emotions in speech -: Experiments with prosody and quality features in speech for use in categorical and dimensional emotion recognition environments
    Borchert, M
    Düsterhöft, A
    Proceedings of the 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE'05), 2005, : 147 - 151
  • [47] Emotion Recognition in Chinese Natural Speech by Combining Prosody and Voice Quality Features
    Zhang, Shiqing
    ADVANCES IN NEURAL NETWORKS - ISNN 2008, PT 2, PROCEEDINGS, 2008, 5264 : 457 - 464
  • [48] Handling emotional speech: a prosody based data augmentation technique for improving neutral speech trained ASR systems
    Pavan Raju Kammili
    B. H. V. S. Ramakrishnam Raju
    A. Sri Krishna
    International Journal of Speech Technology, 2022, 25 : 197 - 204
  • [49] Improving Speech Synthesis by Automatic Speech Recognition and Speech Discriminator
    Huang, Li-Yu
    Chen, Chia-Ping
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2024, 40 (01) : 189 - 200
  • [50] Improving the accuracy of the speech synthesis based phonetic alignment using multiple acoustic features
    Paulo, S
    Oliveira, LC
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANAGUAGE, PROCEEDINGS, 2003, 2721 : 31 - 39