A System for Transforming the Emotion in Speech: Combining Data-Driven Conversion Techniques for Prosody and Voice Quality

被引:0
|
作者
Inanoglu, Zeynep [1 ]
Young, Steve [1 ]
机构
[1] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England
关键词
expressive speech synthesis; emotion conversion; voice conversion;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes a system that combines independent transformation techniques to endow a neutral utterance with some required target emotion. The system consists of three modules that are each trained on a limited amount of speech data and act on differing temporal layers. F0 contours are mode-lied and generated using context-sensitive syllable HMMs, while durations are transformed using phone-based relative decision trees. For spectral conversion which is applied at the segmental level, two methods were investigated: a GMM-based voice conversion approach and a codebook selection approach. Converted test data were evaluated for three emotions using an independent emotion classifier as well as perceptual listening tests. The listening test results show that perception of sadness output by our system was comparable with the perception of human sad speech while the perception of surprise and anger was around 5% worse than that of a human speaker.
引用
收藏
页码:457 / 460
页数:4
相关论文
共 50 条
  • [1] Emotion Recognition in Chinese Natural Speech by Combining Prosody and Voice Quality Features
    Zhang, Shiqing
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2008, PT 2, PROCEEDINGS, 2008, 5264 : 457 - 464
  • [2] Data-driven techniques in speech synthesis
    Dutoit, T
    [J]. COMPUTATIONAL LINGUISTICS, 2002, 28 (04) : 570 - 572
  • [3] Data-driven emotion conversion in spoken English
    Inanoglu, Zeynep
    Young, Steve
    [J]. SPEECH COMMUNICATION, 2009, 51 (03) : 268 - 283
  • [4] A Comparison of Voice Conversion Methods for Transforming Voice Quality in Emotional Speech Synthesis
    Tuerk, Oytun
    Schroeder, Marc
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2282 - 2285
  • [5] Statistical methods in data-driven modeling of Spanish prosody for text to speech
    LopezGonzalo, E
    RodriguezGarcia, JM
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1377 - 1380
  • [6] A Statistical Quality Model for Data-Driven Speech Animation
    Ma, Xiaohan
    Deng, Zhigang
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2012, 18 (11) : 1915 - 1927
  • [7] Data-Driven Techniques in Computing System Management
    Li, Tao
    Zeng, Chunqiu
    Jiang, Yexi
    Zhou, Wubai
    Tang, Liang
    Liu, Zheng
    Huang, Yue
    [J]. ACM COMPUTING SURVEYS, 2017, 50 (03)
  • [8] Causal Speech Enhancement Combining Data-driven Learning and Suppression Rule Estimation
    Mirsamadi, Seyedmandad
    Tashev, Ivan
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2870 - 2874
  • [9] The AMITIES system: Data-driven techniques for automated dialogue
    Hardy, H
    Biermann, A
    Inouye, RB
    McKenzie, A
    Strzalkowski, T
    Ursu, C
    Webb, N
    Wu, M
    [J]. SPEECH COMMUNICATION, 2006, 48 (3-4) : 354 - 373
  • [10] A data-driven non-intrusive measure of speech quality and intelligibility
    Sharma, Dushyant
    Wang, Yu
    Naylor, Patrick A.
    Brookes, Mike
    [J]. SPEECH COMMUNICATION, 2016, 80 : 84 - 94