A statistical approach for modeling prosody features using POS tags for emotional speech synthesis

被引:0
|
作者
Bulut, Murtaza [1 ]
Lee, Sungbok [1 ]
Narayanan, Shrikanth [1 ]
机构
[1] Univ South Calif, Dept Elect Engn, Los Angeles, CA 90089 USA
关键词
POS; emotion; prosody; energy; conversion;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deriving statistical models for emotional speech processing is a challenging problem because of the highly varying nature of emotion expressions. We address this problem by modeling prosodic parameter differences at the part of speech (POS) level for emotional utterances for the purpose of emotional speech synthesis. Synthesis at the POS level is appealing because POS tags carry salient information conveying speech prominence. Analysis of energy, duration and F0 differences between matching neutral-angry, neutral-sad and neutral-happy emotional utterance pairs shows that Gaussian distributions can be used to model the parameter differences. Pairwise comparisons of POS features reveal that it is more probable that the normalized mean and median energy of sad POS tags are larger than neutral, angry or happy POS tags. They also show that for particular tags it is more likely that angry emotion has higher F0 median than happy emotion, and that sad emotion has higher F0 median than neutral emotion. Experiments of conversion of neutral speech into emotional speech using the Gaussian probability functions provide helpful insights into the application of statistical models in speech synthesis.
引用
收藏
页码:1237 / +
页数:2
相关论文
共 50 条
  • [1] Prosody analysis and modeling for emotional speech synthesis
    Jiang, DN
    Zhang, W
    Shen, LQ
    Cai, LH
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 281 - 284
  • [2] Analysis of emotional speech prosody in terms of part of speech tags
    Bulut, Murtaza
    Lee, Sungbok
    Narayanan, Shrikanth
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2560 - 2563
  • [3] Emotional speech synthesis using subspace constraints in prosody
    Mori, Shinya
    Moriyama, Tsuyoshi
    Ozawa, Shinji
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 1093 - +
  • [4] ACCENT GROUP MODELING FOR IMPROVED PROSODY IN STATISTICAL PARAMETERIC SPEECH SYNTHESIS
    Anumanchipalli, Gopala Krishna
    Oliveira, Luis C.
    Black, Alan W.
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6890 - 6894
  • [5] Comparison of chironomic stylization versus statistical modeling of prosody for expressive speech synthesis
    Evrard, Marc
    Delalez, Samuel
    d'Alessandro, Christophe
    Rilliard, Albert
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3370 - 3374
  • [6] MULTI-SPEAKER EMOTIONAL SPEECH SYNTHESIS WITH FINE-GRAINED PROSODY MODELING
    Lu, Chunhui
    Wen, Xue
    Liu, Ruolan
    Chen, Xiao
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5729 - 5733
  • [7] Estimating Mutual Information in Prosody Representation for Emotional Prosody Transfer in Speech Synthesis
    Zhang, Guangyan
    Qiu, Shirong
    Qin, Ying
    Lee, Tan
    [J]. 2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [8] Prosody Conversion for Emotional Mandarin Speech Synthesis Using the Tone Nucleus Model
    Wen, Miaomiao
    Wang, Miaomiao
    Hirose, Keikichi
    Minematsu, Nobuaki
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2808 - +
  • [9] Synthesis of emotional speech by prosody modification of vowel segments of neutral speech
    Fahad, Md Shah
    Singh, Shreya
    Gupta, Shruti
    Deepak, Akshay
    Abhinav
    [J]. Recent Advances in Computer Science and Communications, 2021, 14 (04) : 1226 - 1235
  • [10] Psychophysiological features of perceptual learning in the process of speech emotional prosody recognition
    Dmitrieva, E.
    Gelman, V.
    Zaitseva, K.
    Orlov, A.
    [J]. INTERNATIONAL JOURNAL OF PSYCHOPHYSIOLOGY, 2012, 85 (03) : 375 - 375