Automatic source speaker selection for voice conversion

被引：0

作者：

Turk, Oytun ^{[1
]}

Arslan, Levent M. ^{[2
]}

机构：

[1] Bogazici Univ, Dept Elect & Elect Engn, TR-34342 Istanbul, Turkey

[2] Sestek Inc, R&D Dept, TR-34342 Istanbul, Turkey

来源：

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA | 2009年 / 125卷 / 01期

关键词：

hearing; learning (artificial intelligence); neural nets; regression analysis; speaker recognition; speech coding; PROCESSING TECHNIQUES; TRANSFORMATION; QUALITY;

D O I：

10.1121/1.3027445

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper focuses on the importance of source speaker selection for a weighted codebook mapping based voice conversion algorithm. First, the dependency on source speakers is evaluated in a subjective listening test using 180 different source-target pairs from a database of 20 speakers. Subjective scores for similarity to target speaker's voice and quality are obtained. Statistical analysis of scores confirms the dependence of performance on source speakers for both male-to-male and female-to-female transformations. A source speaker selection algorithm is devised given a target speaker and a set of source speaker candidates. For this purpose, an artificial neural network (ANN) is trained that learns the regression between a set of acoustical distance measures and the subjective scores. The estimated scores are used in source speaker ranking. The average cross-correlation coefficient between rankings obtained from median subjective scores and rankings estimated by the algorithm is 0.84 for similarity and 0.78 for quality in male-to-male transformations. The results for female-to-female transformations were less reliable with a cross-correlation value of 0.58 for both similarity and quality.

引用

页码：480 / 491

页数：12

共 50 条

[31] Wavelet Analysis of Speaker Dependent and Independent Prosody for Voice Conversion
Sisman, Berrak
Li, Haizhou
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 52 - 56
[32] One-shot Voice Conversion with Global Speaker Embeddings
Lu, Hui
Wu, Zhiyong
Dai, Dongyang
Li, Runnan
Kang, Shiyin
Jia, Jia
Meng, Helen
INTERSPEECH 2019, 2019, : 669 - 673
[33] Robust Threshold Selection for Environment Specific Voice in Speaker Recognition
Soumen Kanrar
Wireless Personal Communications, 2022, 126 : 3071 - 3092
[34] Robust Threshold Selection for Environment Specific Voice in Speaker Recognition
Kanrar, Soumen
WIRELESS PERSONAL COMMUNICATIONS, 2022, 126 (04) : 3071 - 3092
[35] Probabilistic Integration of Joint Density Model and Speaker Model for Voice Conversion
Saito, Daisuke
Watanabe, Shinji
Nakamura, Atsushi
Minematsu, Nobuaki
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1728 - +
[36] Speaker-Independent Emotional Voice Conversion via Disentangled Representations
Chen, Xunquan
Xu, Xuexin
Chen, Jinhui
Zhang, Zhizhong
Takiguchi, Tetsuya
Hancock, Edwin R.
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7480 - 7493
[37] OPTIMIZING VOICE CONVERSION NETWORK WITH CYCLE CONSISTENCY LOSS OF SPEAKER IDENTITY
Du, Hongqiang
Tian, Xiaohai
Xie, Lei
Li, Haizhou
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 507 - 513
[38] Voice Conversion for TTS Systems with Tuning on the Target Speaker Based on GMM
Zahariev, Vadim
Azarov, Elias
Petrovsky, Alexander
SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 788 - 798
[39] ONE-SHOT VOICE CONVERSION BASED ON SPEAKER AWARE MODULE
Zhang, Ying
Che, Hao
Li, Jie
Li, Chenxing
Wang, Xiaorui
Wang, Zhongyuan
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5959 - 5963
[40] Who is Speaking Actually? Robust and Versatile Speaker Traceability for Voice Conversion
Ren, Yanzhen
Zhu, Hongcheng
Zhai, Liming
Sun, Zongkun
Shen, Rubing
Wang, Lina
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 8674 - 8685

← 1 2 3 4 5 →