Conversion function clustering and selection for expressive voice conversion

被引:0
|
作者
Hsia, Chi-Chun [1 ]
Wu, Chung-Hsien [1 ]
Wu, Jian-Qi [1 ]
机构
[1] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan 70101, Taiwan
关键词
speech synthesis; voice conversion; Gaussian mixture bi-gram model; linguistic information; expression;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
in this study, a conversion function clustering and selection approach to conversion-based expressive speech synthesis is proposed. First, a set of small-sized emotional parallel speech databases is designed and collected to train the conversion functions. Gaussian mixture bi-gram model (GMBM) is adopted as the conversion function to model the temporal and spectral evolution of speech. Conversion functions initially constructed from the parallel sub-syllable pairs in the speech database are clustered based on linguistic and spectral information. Subjective and objective evaluations with statistical hypothesis testing were conducted to evaluate the quality of the converted speech. The results show that the proposed method exhibits encouraging potential in conversion-based expressive speech synthesis.
引用
收藏
页码:689 / +
页数:2
相关论文
共 50 条
  • [1] Conversion function clustering and selection using linguistic and spectral information for emotional voice conversion
    Hsia, Chi-Chun
    Wu, Chung-Hsien
    Wu, Jian-Qi
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2007, 56 (09) : 1245 - 1254
  • [2] Novel Method for Data Clustering and Mode Selection with Application in Voice Conversion
    Nurminen, Jani
    Tian, Jilei
    Popa, Victor
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2258 - 2261
  • [3] EXEMPLAR SELECTION METHODS IN VOICE CONVERSION
    Zhao, Guanlong
    Gutierrez-Osuna, Ricardo
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5525 - 5529
  • [4] Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion
    Du, Zongyang
    Sisman, Berrak
    Zhou, Kun
    Li, Haizhou
    [J]. INTERSPEECH 2022, 2022, : 2603 - 2607
  • [5] Dynamic Model Selection for Spectral Voice Conversion
    Lanchantin, Pierre
    Rodet, Xavier
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1720 - 1723
  • [6] Voice conversion: Wavelet based residual selection
    Kachare, Pramod
    Cheeran, Alice
    Nirmal, Jagganath
    Zaveri, Mukesh
    [J]. 2015 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2015, : 1513 - 1518
  • [7] Automatic source speaker selection for voice conversion
    Turk, Oytun
    Arslan, Levent M.
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2009, 125 (01): : 480 - 491
  • [8] Evaluation of Expressive Speech Synthesis With Voice Conversion and Copy Resynthesis Techniques
    Turk, Oytun
    Schroeder, Marc
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (05): : 965 - 973
  • [9] EXPRESSIVE VOICE CONVERSION: A JOINT FRAMEWORK FOR SPEAKER IDENTITY AND EMOTIONAL STYLE TRANSFER
    Du, Zongyang
    Sisman, Berrak
    Zhou, Kun
    Li, Haizhou
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 594 - 601
  • [10] PMVC: Data Augmentation-Based Prosody Modeling for Expressive Voice Conversion
    Deng, Yimin
    Tang, Huaizhen
    Zhang, Xulong
    Wang, Jianzong
    Cheng, Ning
    Xiao, Jing
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 184 - 192