A statistical approach for modeling prosody features using POS tags for emotional speech synthesis

被引：0

作者：

Bulut, Murtaza ^{[1
]}

Lee, Sungbok ^{[1
]}

Narayanan, Shrikanth ^{[1
]}

机构：

[1] Univ South Calif, Dept Elect Engn, Los Angeles, CA 90089 USA

来源：

2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3 | 2007年

关键词：

POS; emotion; prosody; energy; conversion;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Deriving statistical models for emotional speech processing is a challenging problem because of the highly varying nature of emotion expressions. We address this problem by modeling prosodic parameter differences at the part of speech (POS) level for emotional utterances for the purpose of emotional speech synthesis. Synthesis at the POS level is appealing because POS tags carry salient information conveying speech prominence. Analysis of energy, duration and F0 differences between matching neutral-angry, neutral-sad and neutral-happy emotional utterance pairs shows that Gaussian distributions can be used to model the parameter differences. Pairwise comparisons of POS features reveal that it is more probable that the normalized mean and median energy of sad POS tags are larger than neutral, angry or happy POS tags. They also show that for particular tags it is more likely that angry emotion has higher F0 median than happy emotion, and that sad emotion has higher F0 median than neutral emotion. Experiments of conversion of neutral speech into emotional speech using the Gaussian probability functions provide helpful insights into the application of statistical models in speech synthesis.

引用

页码：1237 / +

页数：2

共 50 条

[1] Prosody analysis and modeling for emotional speech synthesis
Jiang, DN
Zhang, W
Shen, LQ
Cai, LH
[J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 281 - 284
[2] Analysis of emotional speech prosody in terms of part of speech tags
Bulut, Murtaza
Lee, Sungbok
Narayanan, Shrikanth
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2560 - 2563
[3] Emotional speech synthesis using subspace constraints in prosody
Mori, Shinya
Moriyama, Tsuyoshi
Ozawa, Shinji
[J]. 2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 1093 - +
[4] ACCENT GROUP MODELING FOR IMPROVED PROSODY IN STATISTICAL PARAMETERIC SPEECH SYNTHESIS
Anumanchipalli, Gopala Krishna
Oliveira, Luis C.
Black, Alan W.
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6890 - 6894
[5] Comparison of chironomic stylization versus statistical modeling of prosody for expressive speech synthesis
Evrard, Marc
Delalez, Samuel
d'Alessandro, Christophe
Rilliard, Albert
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3370 - 3374
[6] MULTI-SPEAKER EMOTIONAL SPEECH SYNTHESIS WITH FINE-GRAINED PROSODY MODELING
Lu, Chunhui
Wen, Xue
Liu, Ruolan
Chen, Xiao
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5729 - 5733
[7] Estimating Mutual Information in Prosody Representation for Emotional Prosody Transfer in Speech Synthesis
Zhang, Guangyan
Qiu, Shirong
Qin, Ying
Lee, Tan
[J]. 2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
[8] Prosody Conversion for Emotional Mandarin Speech Synthesis Using the Tone Nucleus Model
Wen, Miaomiao
Wang, Miaomiao
Hirose, Keikichi
Minematsu, Nobuaki
[J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2808 - +
[9] Synthesis of emotional speech by prosody modification of vowel segments of neutral speech
Fahad, Md Shah
Singh, Shreya
Gupta, Shruti
Deepak, Akshay
Abhinav
[J]. Recent Advances in Computer Science and Communications, 2021, 14 (04) : 1226 - 1235
[10] Psychophysiological features of perceptual learning in the process of speech emotional prosody recognition
Dmitrieva, E.
Gelman, V.
Zaitseva, K.
Orlov, A.
[J]. INTERNATIONAL JOURNAL OF PSYCHOPHYSIOLOGY, 2012, 85 (03) : 375 - 375

← 1 2 3 4 5 →