Prosody conversion from neutral speech to emotional speech

被引:135
|
作者
Tao, Jianhua [1 ]
Kang, Yongguo
Li, Aijun
机构
[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100080, Peoples R China
[2] Chinese Acad Social Sci, Inst Linguist, Beijing 100732, Peoples R China
基金
中国国家自然科学基金;
关键词
emotional speech; prosody analysis; speech synthesis;
D O I
10.1109/TASL.2006.876113
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Emotion is an important element in expressive speech synthesis. Unlike traditional discrete emotion simulations, this 5 paper attempts to synthesize emotional speech by using "strong" "medium," and "weak" classifications. This paper tests different models, a linear modification model (LMM), a Gaussian mixture model (GMM), and a classification and regression tree (CART) model. The linear modification model makes direct modification of sentence F0 contours and syllabic durations from acoustic distributions of emotional speech, such as, F0 topline, F0 baseline, durations, and intensities. Further analysis shows that emotional speech is also related to stress and linguistic information. Unlike the linear modification method, the GMM and CART models try to map the subtle prosody distributions between neutral and emotional speech. While the GMM just uses the features, the CART model integrates linguistic features into the mapping. A pitch target model which is optimized to describe Mandarin F0 contours is also introduced. For all conversion methods, a deviation of perceived expressiveness (DPE) measure is created to evaluate the expressiveness of the output speech., The results show that the LMM gives the worst results among the three methods. The GMM method is more suitable for a small training set, while the CART method gives the better emotional speech output if trained With a large context-balanced corpus. The methods discussed in this paper indicate ways to generate emotional speech in speech synthesis. The objective and subjective evaluation processes are also analyzed. These results support the use of a neutral semantic content text in databases for emotional speech synthesis.
引用
收藏
页码:1145 / 1154
页数:10
相关论文
共 50 条
  • [41] Unmasking effects of speech emotional prosody and semantics on auditory informational masking
    Zheng Xi
    Zhang Tingting
    Li Liang
    Fan Ning
    Yang Zhigang
    [J]. ACTA PSYCHOLOGICA SINICA, 2023, 55 (02) : 177 - 191
  • [42] Psychophysiological features of perceptual learning in the process of speech emotional prosody recognition
    Dmitrieva, E.
    Gelman, V.
    Zaitseva, K.
    Orlov, A.
    [J]. INTERNATIONAL JOURNAL OF PSYCHOPHYSIOLOGY, 2012, 85 (03) : 375 - 375
  • [43] Automatic Emphasis Labeling for Emotional Speech by Measuring Prosody Generation Error
    Xu, Jun
    Cai, Lian-Hong
    [J]. EMERGING INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PROCEEDINGS, 2009, 5754 : 177 - 186
  • [44] Musical Speech: a New Methodology for Transcribing Speech Prosody
    Meireles, Alexsandro R.
    Simoes, Antonio R. M.
    Ribeiro, Antonio Celso
    de Medeiros, Beatriz Raposo
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 334 - 338
  • [45] Use of Emotional and Neutral Speech in Evaluating Compression Speeds
    Korhonen, Petri
    Kuk, Francis
    Slugocki, Christopher
    Davis-Ruperto, Neal
    [J]. JOURNAL OF THE AMERICAN ACADEMY OF AUDIOLOGY, 2021, 32 (04) : 268 - 274
  • [46] Prosody generation in text-to-speech conversion using dependency graphs
    Lindstrom, A
    Bretan, I
    Ljungqvist, M
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1341 - 1344
  • [47] Modification of energy spectra, epoch parameters and prosody for emotion conversion in speech
    Haque A.
    Rao K.S.
    [J]. International Journal of Speech Technology, 2017, 20 (1) : 15 - 25
  • [48] Noise and acoustic modeling with waveform generator in text-to-speech and neutral speech conversion
    Al-Radhi, Mohammed Salah
    Csapo, Tamas Gabor
    Nemeth, Geza
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (02) : 1969 - 1994
  • [49] Noise and acoustic modeling with waveform generator in text-to-speech and neutral speech conversion
    Mohammed Salah Al-Radhi
    Tamás Gábor Csapó
    Géza Németh
    [J]. Multimedia Tools and Applications, 2021, 80 : 1969 - 1994
  • [50] Voice Conversion to Emotional Speech based on Three-layered Model in Dimensional Approach and Parameterization of Dynamic Features in Prosody
    Xue, Yawen
    Hamada, Yasuhiro
    Akagi, Masato
    [J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,