A statistical approach for modeling prosody features using POS tags for emotional speech synthesis

被引：0

作者：

Bulut, Murtaza ^{[1
]}

Lee, Sungbok ^{[1
]}

Narayanan, Shrikanth ^{[1
]}

机构：

[1] Univ South Calif, Dept Elect Engn, Los Angeles, CA 90089 USA

来源：

2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3 | 2007年

关键词：

POS; emotion; prosody; energy; conversion;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Deriving statistical models for emotional speech processing is a challenging problem because of the highly varying nature of emotion expressions. We address this problem by modeling prosodic parameter differences at the part of speech (POS) level for emotional utterances for the purpose of emotional speech synthesis. Synthesis at the POS level is appealing because POS tags carry salient information conveying speech prominence. Analysis of energy, duration and F0 differences between matching neutral-angry, neutral-sad and neutral-happy emotional utterance pairs shows that Gaussian distributions can be used to model the parameter differences. Pairwise comparisons of POS features reveal that it is more probable that the normalized mean and median energy of sad POS tags are larger than neutral, angry or happy POS tags. They also show that for particular tags it is more likely that angry emotion has higher F0 median than happy emotion, and that sad emotion has higher F0 median than neutral emotion. Experiments of conversion of neutral speech into emotional speech using the Gaussian probability functions provide helpful insights into the application of statistical models in speech synthesis.

引用

页码：1237 / +

页数：2

共 50 条

[41] Finding Relevant Features for Statistical Speech Synthesis Adaptation
Bruneau, Pierrick
Parisot, Olivier
Mohammadi, Amir
Demiroglu, Cenk
Ghoniem, Mohammad
Tamisier, Thomas
[J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
[42] Modeling stylized invariance and local variability of prosody in text-to-speech synthesis
Chu, Min
Zhao, Yong
Chang, Eric
[J]. SPEECH COMMUNICATION, 2006, 48 (06) : 716 - 726
[43] Statistical approach to the automatic synthesis of Czech speech
Matousek, J
Psutka, J
Tychtl, Z
[J]. TEXT, SPEECH AND DIALOGUE, 1999, 1692 : 376 - 379
[44] Using Pitch and Length Information to Assess Speech Prosody: a Parallel Approach
Chan, Hang
[J]. ENGLISH TEACHING AND LEARNING, 2019, 43 (02): : 125 - 146
[45] Integrating Rule and Template- based Approaches to Prosody Generation for Emotional BODO Speech Synthesis
Thakuria, Laba Kr
Acharjee, Purnendu
Das, Akalpita
Thakdar, P. H.
[J]. 2014 FOURTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT), 2014, : 939 - 943
[46] Prosody Generation by Integrating Rule and Template-based Approaches for Emotional Malay Speech Synthesis
Begum, Mumtaz
Ainon, Raja N.
Zainuddin, Roziati
Don, Zuraidah M.
Knowles, Gerry
[J]. 2008 IEEE REGION 10 CONFERENCE: TENCON 2008, VOLS 1-4, 2008, : 597 - +
[47] A STATISTICAL APPROACH TO AUTOMATIC SPEECH RECOGNITION USING THE ATOMIC SPEECH UNITS CONSTRUCTED FROM OVERLAPPING ARTICULATORY FEATURES
DENG, L
SUN, DX
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1994, 95 (05): : 2702 - 2719
[48] Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech
Barra-Chicote, Roberto
Yamagishi, Junichi
King, Simon
Manuel Montero, Juan
Macias-Guarasa, Javier
[J]. SPEECH COMMUNICATION, 2010, 52 (05) : 394 - 404
[49] Improved voicing decision using glottal activity features for statistical parametric speech synthesis
Adiga, Nagaraj
Khonglah, Banriskhem K.
Prasanna, S. R. Mahadeva
[J]. DIGITAL SIGNAL PROCESSING, 2017, 71 : 131 - 143
[50] Formant Features Statistical Analysis of Male and Female Emotional Speech in Czech and Slovak
Pribil, Jiri
Pribilova, Anna
[J]. 2012 35TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2012, : 427 - 431

← 1 2 3 4 5 →