Prosody analysis and modeling for emotional speech synthesis

被引：0

作者：

Jiang, DN

Zhang, W

Shen, LQ

Cai, LH

机构：

来源：

2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING | 2005年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Current concatenative Text-to-Speech systems can synthesize varied emotions, but the subtle and range of the results are limited because large amount of emotional speech data are required. This paper studies a more flexible approach based on analyzing and modeling the emotional prosody features. Perceptual tests are first performed to investigate whether just manipulating prosody features can attain the communication purposes of emotions. Then, based on the positive results, the same corpus with sufficient prosody coverage is shared by different emotions in unit selection. Finally, an adaptation algorithm is proposed to predict the emotional prosody features. It models the prosodic variations by linguistic cues and emotion cues separately, and requires only a small amount of data. Experiments on Mandarin show that the adaptation algorithm can obtain appropriate emotional prosody features, and at least several emotions can be synthesized without the use of special emotional corpus.

引用

页码：281 / 284

页数：4

共 50 条

[1] A statistical approach for modeling prosody features using POS tags for emotional speech synthesis
Bulut, Murtaza
Lee, Sungbok
Narayanan, Shrikanth
[J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 1237 - +
[2] MULTI-SPEAKER EMOTIONAL SPEECH SYNTHESIS WITH FINE-GRAINED PROSODY MODELING
Lu, Chunhui
Wen, Xue
Liu, Ruolan
Chen, Xiao
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5729 - 5733
[3] Analysis of emotional speech prosody in terms of part of speech tags
Bulut, Murtaza
Lee, Sungbok
Narayanan, Shrikanth
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2560 - 2563
[4] Emotional speech synthesis using subspace constraints in prosody
Mori, Shinya
Moriyama, Tsuyoshi
Ozawa, Shinji
[J]. 2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 1093 - +
[5] Estimating Mutual Information in Prosody Representation for Emotional Prosody Transfer in Speech Synthesis
Zhang, Guangyan
Qiu, Shirong
Qin, Ying
Lee, Tan
[J]. 2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
[6] Synthesis of emotional speech by prosody modification of vowel segments of neutral speech
Fahad, Md Shah
Singh, Shreya
Gupta, Shruti
Deepak, Akshay
Abhinav
[J]. Recent Advances in Computer Science and Communications, 2021, 14 (04) : 1226 - 1235
[7] Emotional Prosody Control for Speech Generation
Sivaprasad, Sarath
Kosgi, Saiteja
Gandhi, Vineet
[J]. INTERSPEECH 2021, 2021, : 4653 - 4657
[8] Multi-level Prosody and Spectrum Conversion for Emotional Speech Synthesis
Wang, Zexun
Yu, Yibiao
[J]. 2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 588 - 593
[9] Prosody conversion from neutral speech to emotional speech
Tao, Jianhua
Kang, Yongguo
Li, Aijun
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (04): : 1145 - 1154
[10] HIERARCHICAL PROSODY MODELING FOR NON-AUTOREGRESSIVE SPEECH SYNTHESIS
Chien, Chung-Ming
Lee, Hung-yi
[J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 446 - 453

← 1 2 3 4 5 →