Prosody analysis and modeling for emotional speech synthesis

被引:0
|
作者
Jiang, DN
Zhang, W
Shen, LQ
Cai, LH
机构
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current concatenative Text-to-Speech systems can synthesize varied emotions, but the subtle and range of the results are limited because large amount of emotional speech data are required. This paper studies a more flexible approach based on analyzing and modeling the emotional prosody features. Perceptual tests are first performed to investigate whether just manipulating prosody features can attain the communication purposes of emotions. Then, based on the positive results, the same corpus with sufficient prosody coverage is shared by different emotions in unit selection. Finally, an adaptation algorithm is proposed to predict the emotional prosody features. It models the prosodic variations by linguistic cues and emotion cues separately, and requires only a small amount of data. Experiments on Mandarin show that the adaptation algorithm can obtain appropriate emotional prosody features, and at least several emotions can be synthesized without the use of special emotional corpus.
引用
收藏
页码:281 / 284
页数:4
相关论文
共 50 条
  • [1] A statistical approach for modeling prosody features using POS tags for emotional speech synthesis
    Bulut, Murtaza
    Lee, Sungbok
    Narayanan, Shrikanth
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 1237 - +
  • [2] MULTI-SPEAKER EMOTIONAL SPEECH SYNTHESIS WITH FINE-GRAINED PROSODY MODELING
    Lu, Chunhui
    Wen, Xue
    Liu, Ruolan
    Chen, Xiao
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5729 - 5733
  • [3] Analysis of emotional speech prosody in terms of part of speech tags
    Bulut, Murtaza
    Lee, Sungbok
    Narayanan, Shrikanth
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2560 - 2563
  • [4] Emotional speech synthesis using subspace constraints in prosody
    Mori, Shinya
    Moriyama, Tsuyoshi
    Ozawa, Shinji
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 1093 - +
  • [5] Estimating Mutual Information in Prosody Representation for Emotional Prosody Transfer in Speech Synthesis
    Zhang, Guangyan
    Qiu, Shirong
    Qin, Ying
    Lee, Tan
    [J]. 2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [6] Synthesis of emotional speech by prosody modification of vowel segments of neutral speech
    Fahad, Md Shah
    Singh, Shreya
    Gupta, Shruti
    Deepak, Akshay
    Abhinav
    [J]. Recent Advances in Computer Science and Communications, 2021, 14 (04) : 1226 - 1235
  • [7] Emotional Prosody Control for Speech Generation
    Sivaprasad, Sarath
    Kosgi, Saiteja
    Gandhi, Vineet
    [J]. INTERSPEECH 2021, 2021, : 4653 - 4657
  • [8] Multi-level Prosody and Spectrum Conversion for Emotional Speech Synthesis
    Wang, Zexun
    Yu, Yibiao
    [J]. 2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 588 - 593
  • [9] Prosody conversion from neutral speech to emotional speech
    Tao, Jianhua
    Kang, Yongguo
    Li, Aijun
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (04): : 1145 - 1154
  • [10] HIERARCHICAL PROSODY MODELING FOR NON-AUTOREGRESSIVE SPEECH SYNTHESIS
    Chien, Chung-Ming
    Lee, Hung-yi
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 446 - 453