Automatic source speaker selection for voice conversion

被引：0

作者：

Turk, Oytun ^{[1
]}

Arslan, Levent M. ^{[2
]}

机构：

[1] Bogazici Univ, Dept Elect & Elect Engn, TR-34342 Istanbul, Turkey

[2] Sestek Inc, R&D Dept, TR-34342 Istanbul, Turkey

来源：

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA | 2009年 / 125卷 / 01期

关键词：

hearing; learning (artificial intelligence); neural nets; regression analysis; speaker recognition; speech coding; PROCESSING TECHNIQUES; TRANSFORMATION; QUALITY;

D O I：

10.1121/1.3027445

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper focuses on the importance of source speaker selection for a weighted codebook mapping based voice conversion algorithm. First, the dependency on source speakers is evaluated in a subjective listening test using 180 different source-target pairs from a database of 20 speakers. Subjective scores for similarity to target speaker's voice and quality are obtained. Statistical analysis of scores confirms the dependence of performance on source speakers for both male-to-male and female-to-female transformations. A source speaker selection algorithm is devised given a target speaker and a set of source speaker candidates. For this purpose, an artificial neural network (ANN) is trained that learns the regression between a set of acoustical distance measures and the subjective scores. The estimated scores are used in source speaker ranking. The average cross-correlation coefficient between rankings obtained from median subjective scores and rankings estimated by the algorithm is 0.84 for similarity and 0.78 for quality in male-to-male transformations. The results for female-to-female transformations were less reliable with a cross-correlation value of 0.58 for both similarity and quality.

引用

页码：480 / 491

页数：12

共 50 条

[21] Conversion function clustering and selection for expressive voice conversion
Hsia, Chi-Chun
Wu, Chung-Hsien
Wu, Jian-Qi
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 689 - +
[22] Automatic versus human speaker verification: The case of voice mimicry
Hautamaki, Rosa Gonzalez
Kinnunen, Tomi
Hautamaki, Ville
Laukkanen, Anne-Maria
SPEECH COMMUNICATION, 2015, 72 : 13 - 31
[23] Influence of Natural Voice Disguise Techniques on Automatic Speaker Recognition
Staroniewicz, Piotr
2018 JOINT CONFERENCE - ACOUSTICS, 2018, : 299 - 302
[24] EXEMPLAR SELECTION METHODS IN VOICE CONVERSION
Zhao, Guanlong
Gutierrez-Osuna, Ricardo
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5525 - 5529
[25] Voice Cloning and Mismatch Conditions in Forensic Automatic Speaker Recognition
Kudera, Jacek
Coccia, Miriam
Fadaeijouybari, Sharifeh
Preidt, Till
Ranjan, Akshay
Braun, Angelika
SPEECH AND COMPUTER, SPECOM 2024, PT II, 2025, 15300 : 171 - 184
[26] Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion
Du, Zongyang
Sisman, Berrak
Zhou, Kun
Li, Haizhou
INTERSPEECH 2022, 2022, : 2603 - 2607
[27] Voice Conversion Attacks on Speaker De-Identification Schemes
He, Shufei
Roelse, Peter
2020 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2020, : 310 - 315
[28] GLOTTAL SOURCE MODELING FOR VOICE CONVERSION
CHILDERS, DG
SPEECH COMMUNICATION, 1995, 16 (02) : 127 - 138
[29] Voice conversion with UBM and speaker-specific model adaptation
Zhu, Chunlei
Yu, Yibiao
PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 553 - 556
[30] Accent and Speaker Disentanglement in Many-to-many Voice Conversion
Wang, Zhichao
Ge, Wenshuo
Wang, Xiong
Yang, Shan
Gan, Wendong
Chen, Haitao
Li, Hai
Xie, Lei
Li, Xiulin
2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,

← 1 2 3 4 5 →