Runtime and Speech Quality Survey of a Voice Conversion Method

被引:0
|
作者
Jokisch, Oliver [1 ]
Birhanu, Yitagessu [2 ]
Hoffmann, Ruediger [2 ]
机构
[1] Leipzig Univ Telecommun, Inst Commun Engn, Gustav Freytag St 43, D-04277 Leipzig, Germany
[2] Tech Univ Dresden, Chair Syst Theory & Speech Technol, D-01069 Dresden, Germany
来源
关键词
voice conversion; VTLN; runtime performance; speech quality; MOS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Several methods for voice conversion have been established. The research aims at the characteristics of a target speaker and a near-to-natural speech quality. This contribution summarizes the listening experiments with four conversion methods including the assessment of speech quality, listening effort and similarity to the target voice. The subjective evaluation of similarity is checked by an instrumental distance measure based on logarithmic spectral distortion. Practical applications of voice conversion require an appropriate runtime performance and memory use. We select a conversion method based on VTLN to demonstrate the runtime and quality trade-off. In the case example, we survey the quality assessment depending on different training constellations with a varied data amount and training time. Furthermore, we discuss the runtime performance of the selected conversion method under typical operating conditions. The experiments cover the influence of system resources, setting of conversion parameters (warping factors) and different training constellations. The observed real-time factors of a non-optimized laboratory VC version are inappropriate for typical application scenarios.
引用
收藏
页码:1684 / 1688
页数:5
相关论文
共 50 条
  • [1] IMPROVING VOICE QUALITY OF HMM-BASED SPEECH SYNTHESIS USING VOICE CONVERSION METHOD
    Jiao, Yishan
    Xie, Xiang
    Na, Xingyu
    Tu, Ming
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [2] A Comparison of Voice Conversion Methods for Transforming Voice Quality in Emotional Speech Synthesis
    Tuerk, Oytun
    Schroeder, Marc
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2282 - 2285
  • [3] Voice quality conversion in TD-PSOLA speech synthesis
    Sun, XJ
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 953 - 956
  • [4] A ANN BASED HIGH QUALITY METHOD FOR VOICE CONVERSION
    Chen, Z.
    Zhang, L. H.
    [J]. 2010 6TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS NETWORKING AND MOBILE COMPUTING (WICOM), 2010,
  • [5] On the transformation of the speech spectrum for voice conversion
    Baudoin, G
    Stylianou, Y
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1405 - 1408
  • [6] Voice Conversion for Whispered Speech Synthesis
    Cotescu, Marius
    Drugman, Thomas
    Huybrechts, Goeric
    Lorenzo-Trueba, Jaime
    Moinet, Alexis
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 186 - 190
  • [7] Iteratively Improving Speech Recognition and Voice Conversion
    Singh, Mayank Kumar
    Takahashi, Naoya
    Onoe, Naoyuki
    [J]. INTERSPEECH 2023, 2023, : 206 - 210
  • [8] The effect of speech melody on voice quality
    Swerts, M
    Veldhuis, R
    [J]. SPEECH COMMUNICATION, 2001, 33 (04) : 297 - 303
  • [9] The analysis of voice quality in speech processing
    Keller, E
    [J]. NONLINEAR SPEECH MODELING AND APPLICATIONS, 2005, 3445 : 54 - 73
  • [10] The emotional quality of speech in voice services
    Maffiolo, V
    Chateau, N
    [J]. ERGONOMICS, 2003, 46 (13-14) : 1375 - 1385