Runtime and Speech Quality Survey of a Voice Conversion Method

被引：0

作者：

Jokisch, Oliver ^{[1
]}

Birhanu, Yitagessu ^{[2
]}

Hoffmann, Ruediger ^{[2
]}

机构：

[1] Leipzig Univ Telecommun, Inst Commun Engn, Gustav Freytag St 43, D-04277 Leipzig, Germany

[2] Tech Univ Dresden, Chair Syst Theory & Speech Technol, D-01069 Dresden, Germany

来源：

2013 IEEE EUROCON | 2013年

关键词：

voice conversion; VTLN; runtime performance; speech quality; MOS;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Several methods for voice conversion have been established. The research aims at the characteristics of a target speaker and a near-to-natural speech quality. This contribution summarizes the listening experiments with four conversion methods including the assessment of speech quality, listening effort and similarity to the target voice. The subjective evaluation of similarity is checked by an instrumental distance measure based on logarithmic spectral distortion. Practical applications of voice conversion require an appropriate runtime performance and memory use. We select a conversion method based on VTLN to demonstrate the runtime and quality trade-off. In the case example, we survey the quality assessment depending on different training constellations with a varied data amount and training time. Furthermore, we discuss the runtime performance of the selected conversion method under typical operating conditions. The experiments cover the influence of system resources, setting of conversion parameters (warping factors) and different training constellations. The observed real-time factors of a non-optimized laboratory VC version are inappropriate for typical application scenarios.

引用

页码：1684 / 1688

页数：5

共 50 条

[1] IMPROVING VOICE QUALITY OF HMM-BASED SPEECH SYNTHESIS USING VOICE CONVERSION METHOD
Jiao, Yishan
Xie, Xiang
Na, Xingyu
Tu, Ming
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[2] A Comparison of Voice Conversion Methods for Transforming Voice Quality in Emotional Speech Synthesis
Tuerk, Oytun
Schroeder, Marc
[J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2282 - 2285
[3] Voice quality conversion in TD-PSOLA speech synthesis
Sun, XJ
[J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 953 - 956
[4] A ANN BASED HIGH QUALITY METHOD FOR VOICE CONVERSION
Chen, Z.
Zhang, L. H.
[J]. 2010 6TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS NETWORKING AND MOBILE COMPUTING (WICOM), 2010,
[5] On the transformation of the speech spectrum for voice conversion
Baudoin, G
Stylianou, Y
[J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1405 - 1408
[6] Voice Conversion for Whispered Speech Synthesis
Cotescu, Marius
Drugman, Thomas
Huybrechts, Goeric
Lorenzo-Trueba, Jaime
Moinet, Alexis
[J]. IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 186 - 190
[7] Iteratively Improving Speech Recognition and Voice Conversion
Singh, Mayank Kumar
Takahashi, Naoya
Onoe, Naoyuki
[J]. INTERSPEECH 2023, 2023, : 206 - 210
[8] The effect of speech melody on voice quality
Swerts, M
Veldhuis, R
[J]. SPEECH COMMUNICATION, 2001, 33 (04) : 297 - 303
[9] The analysis of voice quality in speech processing
Keller, E
[J]. NONLINEAR SPEECH MODELING AND APPLICATIONS, 2005, 3445 : 54 - 73
[10] The emotional quality of speech in voice services
Maffiolo, V
Chateau, N
[J]. ERGONOMICS, 2003, 46 (13-14) : 1375 - 1385

← 1 2 3 4 5 →