Emotional speech synthesis based on improved codebook mapping voice conversion

被引:0
|
作者
Wang, YP [1 ]
Ling, ZH [1 ]
Wang, RH [1 ]
机构
[1] Univ Sci & Technol China, iFlytek Speech Lab, Hefei 230026, Peoples R China
来源
AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, PROCEEDINGS | 2005年 / 3784卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a spectral transformation method for emotional speech synthesis based on voice conversion framework. Three emotions are studied, including anger, happiness and sadness. For the sake of high naturalness, superior speech quality and emotion expressiveness, our original STASC system is modified by introducing a new feature selection strategy and hierarchical codebook mapping procedure. Our result shows that the LSF coefficients at low frequency carry more emotion-relative information, and therefore only these coefficients are converted. Listening tests prove that the proposed method can achieve a satisfactory balance between emotional expression and speech quality of converted speech signals.
引用
收藏
页码:374 / 381
页数:8
相关论文
共 50 条
  • [21] A COMPARISON OF DISCRETE AND SOFT SPEECH UNITS FOR IMPROVED VOICE CONVERSION
    van Niekerk, Benjamin
    Carbonneau, Marc-Andre
    Zaidi, Julian
    Baas, Matthew
    Seute, Hugo
    Kamper, Herman
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6562 - 6566
  • [22] Spectral voice conversion for text-to-speech synthesis
    Kain, A
    Macon, MW
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 285 - 288
  • [23] Synthesis of Child Speech With HMM Adaptation and Voice Conversion
    Watts, Oliver
    Yamagishi, Junichi
    King, Simon
    Berkling, Kay
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (05): : 1005 - 1016
  • [24] Codebook Clustering for Unit Selection based EMG-to-Speech Conversion
    Diener, Lorenz
    Janke, Matthias
    Schultz, Tanja
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2420 - 2424
  • [25] An improved CycleGAN-based emotional voice conversion model by augmenting temporal dependency with a transformer
    Fu, Changzeng
    Liu, Chaoran
    Ishi, Carlos Toshinori
    Ishiguro, Hiroshi
    SPEECH COMMUNICATION, 2022, 144 : 110 - 121
  • [26] Speech Analysis/Synthesis by Gaussian Mixture Approximation of the Speech Spectrum for Voice Conversion
    Amini, Jamal
    Shahrebabaki, Abdoreza Sabzi
    Shokouhi, Navid
    Sheikhzadeh, Hamid
    Raahemifa, Kaamran
    Eslami, Mehdi
    2013 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (IEEE ISSPIT 2013), 2013, : 428 - 433
  • [27] A Dual Alignment Scheme for Improved Speech-to-Singing Voice Conversion
    Vijayan, Karthika
    Dong, Minghui
    Li, Haizhou
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1598 - 1606
  • [28] Electrolaryngeal Speech Enhancement Based on Statistical Voice Conversion
    Nakamura, Keigo
    Toda, Tomoki
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1443 - 1446
  • [29] Voice quality conversion in TD-PSOLA speech synthesis
    Sun, XJ
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 953 - 956
  • [30] HMM adaptation and voice conversion for the synthesis of child speech: a comparison
    Watts, Oliver
    Yamagishi, Junichi
    King, Simon
    Berkling, Kay
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2595 - +