Emotional speech synthesis based on improved codebook mapping voice conversion

被引:0
|
作者
Wang, YP [1 ]
Ling, ZH [1 ]
Wang, RH [1 ]
机构
[1] Univ Sci & Technol China, iFlytek Speech Lab, Hefei 230026, Peoples R China
来源
AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, PROCEEDINGS | 2005年 / 3784卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a spectral transformation method for emotional speech synthesis based on voice conversion framework. Three emotions are studied, including anger, happiness and sadness. For the sake of high naturalness, superior speech quality and emotion expressiveness, our original STASC system is modified by introducing a new feature selection strategy and hierarchical codebook mapping procedure. Our result shows that the LSF coefficients at low frequency carry more emotion-relative information, and therefore only these coefficients are converted. Listening tests prove that the proposed method can achieve a satisfactory balance between emotional expression and speech quality of converted speech signals.
引用
收藏
页码:374 / 381
页数:8
相关论文
共 50 条
  • [31] Hierarchical Prosody Conversion Using Regression-Based Clustering for Emotional Speech Synthesis
    Wu, Chung-Hsien
    Hsia, Chi-Chun
    Lee, Chung-Han
    Lin, Mai-Chun
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1394 - 1405
  • [32] Online Model Adaptation for Voice Conversion using Model-based Speech Synthesis Techniques
    Wu, Dalei
    Li, Baojie
    Jiang, Hui
    Fu, Qian-Jie
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1611 - +
  • [33] A hybrid GMM and codebook mapping method for spectral conversion
    Kang, YG
    Shuang, ZW
    Tao, JH
    Zhang, W
    Xu, B
    AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, PROCEEDINGS, 2005, 3784 : 303 - 310
  • [34] StarGAN-based Emotional Voice Conversion for Japanese Phrases
    Moritani, Asuka
    Sakamoto, Shoki
    Ozaki, Ryo
    Kameoka, Hirokazu
    Taniguchi, Tadahiro
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 836 - 840
  • [35] The emotional quality of speech in voice services
    Maffiolo, V
    Chateau, N
    ERGONOMICS, 2003, 46 (13-14) : 1375 - 1385
  • [36] Electrolaryngeal Speech Enhancement with Statistical Voice Conversion based on CLDNN
    Kobayashi, Kazuhiro
    Toda, Tomoki
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 2115 - 2119
  • [37] Nonparallel Emotional Speech Conversion
    Gao, Jian
    Chakraborty, Deep
    Tembine, Hamidou
    Olaleye, Olaitan
    INTERSPEECH 2019, 2019, : 2858 - 2862
  • [38] A novel codebook-based excitation model for use in speech synthesis
    Csapo, Tamas Gabor
    Nemeth, Geza
    3RD IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFOCOMMUNICATIONS (COGINFOCOM 2012), 2012, : 661 - 665
  • [39] Evaluation of Expressive Speech Synthesis With Voice Conversion and Copy Resynthesis Techniques
    Turk, Oytun
    Schroeder, Marc
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (05): : 965 - 973
  • [40] Voice Conversion to Emotional Speech based on Three-layered Model in Dimensional Approach and Parameterization of Dynamic Features in Prosody
    Xue, Yawen
    Hamada, Yasuhiro
    Akagi, Masato
    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,