Probabilistic Amplitude Demodulation Features in Speech Synthesis for Improving Prosody

被引：0

作者：

Lazaridis, Alexandros ^{[1
]}

Cernak, Milos ^{[1
]}

Garner, Philip N. ^{[1
]}

机构：

[1] Idiap Res Inst, Martigny, Switzerland

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

基金：

瑞士国家科学基金会;

关键词：

Probabilistic amplitude demodulation; speech synthesis; deep neural networks; speech prosody;

D O I：

10.21437/Interspeech.2016-258

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Amplitude demodulation (AM) is a signal decomposition technique by which a signal can be decomposed to a product of two signals, i.e, a quickly varying carrier and a slowly varying modulator. In this work, the probabilistic amplitude demodulation (PAD) features are used to improve prosody in speech synthesis. The PAD is applied iteratively for generating syllable and stress amplitude modulations in a cascade manner. The PAD features are used as a secondary input scheme along with the standard text-based input features in statistical parametric speech synthesis. Specifically, deep neural network (DNN)-based speech synthesis is used to evaluate the importance of these features. Objective evaluation has shown that the proposed system using the PAD features has improved mainly prosody modelling; it outperforms the baseline system by approximately 5% in terms of relative reduction in root mean square error (RMSE) of the fundamental frequency (FO). The significance of this improvement is validated by subjective evaluation of the overall speech quality, achieving 38.6% over 19.5% preference score in respect to the baseline system, in an ABX test.

引用

页码：2298 / 2302

页数：5

共 50 条

[1] Probabilistic amplitude demodulation
Turner, Richard E.
Sahani, Maneesh
INDEPENDENT COMPONENT ANALYSIS AND SIGNAL SEPARATION, PROCEEDINGS, 2007, 4666 : 544 - +
[2] Study of prosody model on Chinese speech synthesis based on the classification of syllabic prosody features
Tao, Jianhua
Cai, Lianhong
Shengxue Xuebao/Acta Acustica, 2003, 28 (05): : 395 - 402
[3] Improving human scoring of prosody using parametric speech synthesis
Prafianto, Hafiyan
Nose, Takashi
Chiba, Yuya
Ito, Akinori
SPEECH COMMUNICATION, 2019, 111 (14-21) : 14 - 21
[4] Improving Speech Prosody of Audiobook Text-To-Speech Synthesis with Acoustic and Textual Contexts
Xin, Detai
Adavanne, Sharath
Ang, Federico
Kulkarni, Ashish
Takamichi, Shinnosuke
Saruwatari, Hiroshi
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2023,
[5] Compression of prosody for speech modification in synthesis
Ansari, R
Kurek, W
THIRTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2, 1998, : 219 - 223
[6] A statistical approach for modeling prosody features using POS tags for emotional speech synthesis
Bulut, Murtaza
Lee, Sungbok
Narayanan, Shrikanth
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 1237 - +
[7] SPEECH BERT EMBEDDING FOR IMPROVING PROSODY IN NEURAL TTS
Chen, Liping
Deng, Yan
Wang, Xi
Soong, Frank K.
He, Lei
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6563 - 6567
[8] IMPROVING NATURALNESS AND CONTROLLABILITY OF SEQUENCE-TO-SEQUENCE SPEECH SYNTHESIS BY LEARNING LOCAL PROSODY REPRESENTATIONS
Gong, Cheng
Wang, Longbiao
Ling, Zhenhua
Guo, Shaotong
Zhang, Ju
Dang, Jianwu
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5724 - 5728
[9] AUTOMATIC PROSODY PREDICTION FOR CHINESE SPEECH SYNTHESIS USING BLSTM-RNN AND EMBEDDING FEATURES
Ding, Chuang
Xie, Lei
Yan, Jie
Zhang, Weini
Liu, Yang
2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 98 - 102
[10] THE SYNTHEX SYSTEM - TREATMENT OF PROSODY IN SPEECH SYNTHESIS
AGGOUN, A
TSI-TECHNIQUE ET SCIENCE INFORMATIQUES, 1987, 6 (03): : 217 - 229

← 1 2 3 4 5 →