Conversion function clustering and selection using linguistic and spectral information for emotional voice conversion

被引:17
|
作者
Hsia, Chi-Chun [1 ]
Wu, Chung-Hsien [1 ]
Wu, Jian-Qi [1 ]
机构
[1] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan 70101, Taiwan
关键词
emotional text-to-speech synthesis; emotional voice conversion; linguistic feature; function clustering and selection; Gaussian mixture bigram model;
D O I
10.1109/TC.2007.1079
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In emotional speech synthesis, a large speech database is required for high-quality speech output. Voice conversion needs only a compact-sized speech database for each emotion. This study designs and accumulates a set of phonetically balanced small-sized emotional parallel speech databases to construct conversion functions. The Gaussian mixture bigram model (GMBM) is adopted as the conversion function to characterize the temporal and spectral evolution of the speech signal. The conversion function is initially constructed for each instance of parallel subsyllable pairs in the collected speech database. To reduce the total number of conversion functions and select an appropriate conversion function, this study presents a framework by incorporating linguistic and spectral information for conversion function clustering and selection. Subjective and objective evaluations with statistical hypothesis testing are conducted to evaluate the quality of the converted speech. The proposed method compares favorably with previous methods in conversion-based emotional speech synthesis.
引用
收藏
页码:1245 / 1254
页数:10
相关论文
共 50 条
  • [1] Conversion function clustering and selection for expressive voice conversion
    Hsia, Chi-Chun
    Wu, Chung-Hsien
    Wu, Jian-Qi
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 689 - +
  • [2] Dynamic Model Selection for Spectral Voice Conversion
    Lanchantin, Pierre
    Rodet, Xavier
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1720 - 1723
  • [3] English Emotional Voice Conversion Using StarGAN Model
    Meftah, Ali Hamid
    Alashban, Adal A.
    Alotaibi, Yousef A.
    Selouani, Sid Ahmed
    [J]. IEEE ACCESS, 2023, 11 (67835-67849) : 67835 - 67849
  • [4] Novel Method for Data Clustering and Mode Selection with Application in Voice Conversion
    Nurminen, Jani
    Tian, Jilei
    Popa, Victor
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2258 - 2261
  • [5] Using a Manifold Vocoder for Spectral Voice and Style Conversion
    Tuan Dinh
    Kain, Alexander
    Tjaden, Kris
    [J]. INTERSPEECH 2019, 2019, : 1388 - 1392
  • [6] Voice conversion using partitions of spectral feature space
    Verhelst, W
    Mertens, J
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 365 - 368
  • [7] OBJECTIVE EVALUATION OF THE DYNAMIC MODEL SELECTION METHOD FOR SPECTRAL VOICE CONVERSION
    Lanchantin, Pierre
    Rodet, Xavier
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5132 - 5135
  • [8] Fundamental Frequency Modeling Using Wavelets for Emotional Voice Conversion
    Ming, Huaiping
    Huang, Dongyan
    Dong, Minghui
    Li, Haizhou
    Xie, Lei
    Zhang, Shaofei
    [J]. 2015 INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2015, : 804 - 809
  • [9] Spectral Mapping Using Artificial Neural Networks for Voice Conversion
    Desai, Srinivas
    Black, Alan W.
    Yegnanarayana, B.
    Prahallad, Kishore
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (05): : 954 - 964
  • [10] CycleGAN Voice Conversion of Spectral Envelopes using Adversarial Weights
    Ferro, Rafael
    Obin, Nicolas
    Roebel, Axel
    [J]. 28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 406 - 410