A statistical approach for modeling prosody features using POS tags for emotional speech synthesis

被引:0
|
作者
Bulut, Murtaza [1 ]
Lee, Sungbok [1 ]
Narayanan, Shrikanth [1 ]
机构
[1] Univ South Calif, Dept Elect Engn, Los Angeles, CA 90089 USA
关键词
POS; emotion; prosody; energy; conversion;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deriving statistical models for emotional speech processing is a challenging problem because of the highly varying nature of emotion expressions. We address this problem by modeling prosodic parameter differences at the part of speech (POS) level for emotional utterances for the purpose of emotional speech synthesis. Synthesis at the POS level is appealing because POS tags carry salient information conveying speech prominence. Analysis of energy, duration and F0 differences between matching neutral-angry, neutral-sad and neutral-happy emotional utterance pairs shows that Gaussian distributions can be used to model the parameter differences. Pairwise comparisons of POS features reveal that it is more probable that the normalized mean and median energy of sad POS tags are larger than neutral, angry or happy POS tags. They also show that for particular tags it is more likely that angry emotion has higher F0 median than happy emotion, and that sad emotion has higher F0 median than neutral emotion. Experiments of conversion of neutral speech into emotional speech using the Gaussian probability functions provide helpful insights into the application of statistical models in speech synthesis.
引用
收藏
页码:1237 / +
页数:2
相关论文
共 50 条
  • [21] A New Approach of Speaking Rate Modeling for Mandarin Speech Prosody
    Hsieh, Chiao-Hua
    Chiang, Chen-Yu
    Wang, Yih-Ru
    Yu, Hsiu-Min
    Chen, Sin-Horng
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 654 - 657
  • [22] Context Dependent Word Modeling for Statistical Machine Translation Using Part-of-Speech Tags
    Sarikaya, Ruhi
    Deng, Yonggang
    Gao, Yuqing
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2201 - 2204
  • [23] HIERARCHICAL PROSODY MODELING FOR NON-AUTOREGRESSIVE SPEECH SYNTHESIS
    Chien, Chung-Ming
    Lee, Hung-yi
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 446 - 453
  • [24] Voice Conversion to Emotional Speech based on Three-layered Model in Dimensional Approach and Parameterization of Dynamic Features in Prosody
    Xue, Yawen
    Hamada, Yasuhiro
    Akagi, Masato
    [J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [25] Speech Recognition with Word Fragment Detection Using Prosody Features for Spontaneous Speech
    Yeh, Jui-Feng
    Yen, Ming-Chi
    [J]. APPLIED MATHEMATICS & INFORMATION SCIENCES, 2012, 6 (02): : 669S - 675S
  • [26] MEASURING THE EFFECT OF LINGUISTIC RESOURCES ON PROSODY MODELING FOR SPEECH SYNTHESIS
    Rosenberg, Andrew
    Fernandez, Raul
    Ramabhadran, Bhuvana
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5114 - 5118
  • [27] Phonetics and Machine Learning: Hierarchical Modelling of Prosody in Statistical Speech Synthesis
    Vainio, Martti
    [J]. STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2014, 2014, 8791 : 37 - 54
  • [28] Diction based prosody modeling in table-to-speech synthesis
    Spiliotopoulos, D
    Xydas, G
    Kouroupetroglou, G
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2005, 3658 : 294 - 301
  • [29] AUTOMATIC PROSODY PREDICTION FOR CHINESE SPEECH SYNTHESIS USING BLSTM-RNN AND EMBEDDING FEATURES
    Ding, Chuang
    Xie, Lei
    Yan, Jie
    Zhang, Weini
    Liu, Yang
    [J]. 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 98 - 102
  • [30] Investigating Acoustic Cues of Emotional Valence in Mandarin Speech Prosody - A Corpus Approach
    Li, Junlin
    Huang, Chu-Ren
    [J]. CHINESE LEXICAL SEMANTICS, CLSW 2023, PT II, 2024, 14515 : 316 - 330