A statistical approach for modeling prosody features using POS tags for emotional speech synthesis

被引:0
|
作者
Bulut, Murtaza [1 ]
Lee, Sungbok [1 ]
Narayanan, Shrikanth [1 ]
机构
[1] Univ South Calif, Dept Elect Engn, Los Angeles, CA 90089 USA
关键词
POS; emotion; prosody; energy; conversion;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deriving statistical models for emotional speech processing is a challenging problem because of the highly varying nature of emotion expressions. We address this problem by modeling prosodic parameter differences at the part of speech (POS) level for emotional utterances for the purpose of emotional speech synthesis. Synthesis at the POS level is appealing because POS tags carry salient information conveying speech prominence. Analysis of energy, duration and F0 differences between matching neutral-angry, neutral-sad and neutral-happy emotional utterance pairs shows that Gaussian distributions can be used to model the parameter differences. Pairwise comparisons of POS features reveal that it is more probable that the normalized mean and median energy of sad POS tags are larger than neutral, angry or happy POS tags. They also show that for particular tags it is more likely that angry emotion has higher F0 median than happy emotion, and that sad emotion has higher F0 median than neutral emotion. Experiments of conversion of neutral speech into emotional speech using the Gaussian probability functions provide helpful insights into the application of statistical models in speech synthesis.
引用
收藏
页码:1237 / +
页数:2
相关论文
共 50 条
  • [41] Finding Relevant Features for Statistical Speech Synthesis Adaptation
    Bruneau, Pierrick
    Parisot, Olivier
    Mohammadi, Amir
    Demiroglu, Cenk
    Ghoniem, Mohammad
    Tamisier, Thomas
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
  • [42] Modeling stylized invariance and local variability of prosody in text-to-speech synthesis
    Chu, Min
    Zhao, Yong
    Chang, Eric
    [J]. SPEECH COMMUNICATION, 2006, 48 (06) : 716 - 726
  • [43] Statistical approach to the automatic synthesis of Czech speech
    Matousek, J
    Psutka, J
    Tychtl, Z
    [J]. TEXT, SPEECH AND DIALOGUE, 1999, 1692 : 376 - 379
  • [44] Using Pitch and Length Information to Assess Speech Prosody: a Parallel Approach
    Chan, Hang
    [J]. ENGLISH TEACHING AND LEARNING, 2019, 43 (02): : 125 - 146
  • [45] Integrating Rule and Template- based Approaches to Prosody Generation for Emotional BODO Speech Synthesis
    Thakuria, Laba Kr
    Acharjee, Purnendu
    Das, Akalpita
    Thakdar, P. H.
    [J]. 2014 FOURTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT), 2014, : 939 - 943
  • [46] Prosody Generation by Integrating Rule and Template-based Approaches for Emotional Malay Speech Synthesis
    Begum, Mumtaz
    Ainon, Raja N.
    Zainuddin, Roziati
    Don, Zuraidah M.
    Knowles, Gerry
    [J]. 2008 IEEE REGION 10 CONFERENCE: TENCON 2008, VOLS 1-4, 2008, : 597 - +
  • [47] A STATISTICAL APPROACH TO AUTOMATIC SPEECH RECOGNITION USING THE ATOMIC SPEECH UNITS CONSTRUCTED FROM OVERLAPPING ARTICULATORY FEATURES
    DENG, L
    SUN, DX
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1994, 95 (05): : 2702 - 2719
  • [48] Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech
    Barra-Chicote, Roberto
    Yamagishi, Junichi
    King, Simon
    Manuel Montero, Juan
    Macias-Guarasa, Javier
    [J]. SPEECH COMMUNICATION, 2010, 52 (05) : 394 - 404
  • [49] Improved voicing decision using glottal activity features for statistical parametric speech synthesis
    Adiga, Nagaraj
    Khonglah, Banriskhem K.
    Prasanna, S. R. Mahadeva
    [J]. DIGITAL SIGNAL PROCESSING, 2017, 71 : 131 - 143
  • [50] Formant Features Statistical Analysis of Male and Female Emotional Speech in Czech and Slovak
    Pribil, Jiri
    Pribilova, Anna
    [J]. 2012 35TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2012, : 427 - 431