Automatic source speaker selection for voice conversion

被引:0
|
作者
Turk, Oytun [1 ]
Arslan, Levent M. [2 ]
机构
[1] Bogazici Univ, Dept Elect & Elect Engn, TR-34342 Istanbul, Turkey
[2] Sestek Inc, R&D Dept, TR-34342 Istanbul, Turkey
来源
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA | 2009年 / 125卷 / 01期
关键词
hearing; learning (artificial intelligence); neural nets; regression analysis; speaker recognition; speech coding; PROCESSING TECHNIQUES; TRANSFORMATION; QUALITY;
D O I
10.1121/1.3027445
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper focuses on the importance of source speaker selection for a weighted codebook mapping based voice conversion algorithm. First, the dependency on source speakers is evaluated in a subjective listening test using 180 different source-target pairs from a database of 20 speakers. Subjective scores for similarity to target speaker's voice and quality are obtained. Statistical analysis of scores confirms the dependence of performance on source speakers for both male-to-male and female-to-female transformations. A source speaker selection algorithm is devised given a target speaker and a set of source speaker candidates. For this purpose, an artificial neural network (ANN) is trained that learns the regression between a set of acoustical distance measures and the subjective scores. The estimated scores are used in source speaker ranking. The average cross-correlation coefficient between rankings obtained from median subjective scores and rankings estimated by the algorithm is 0.84 for similarity and 0.78 for quality in male-to-male transformations. The results for female-to-female transformations were less reliable with a cross-correlation value of 0.58 for both similarity and quality.
引用
收藏
页码:480 / 491
页数:12
相关论文
共 50 条
  • [41] Zero-Shot Unseen Speaker Anonymization via Voice Conversion
    Chang, Hyung-Pil
    Yoo, In-Chul
    Jeong, Changhyeon
    Yook, Dongsuk
    IEEE ACCESS, 2022, 10 : 130190 - 130199
  • [42] Speaker Anonymization for Personal Information Protection Using Voice Conversion Techniques
    Yoo, In-Chul
    Lee, Keonnyeong
    Leem, Seonggyun
    Oh, Hyunwoo
    Ko, Bonggu
    Yook, Dongsuk
    IEEE ACCESS, 2020, 8 (08): : 198637 - 198645
  • [43] VOICE CONVERSION IN TIME-INVARIANT SPEAKER-INDEPENDENT SPACE
    Nakashika, Toru
    Takiguchi, Tetsuya
    Ariki, Yasuo
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [44] One-shot Voice Conversion with Speaker-agnostic StarGAN
    Eskimez, Sefik Emre
    Dimitriadis, Dimitrios
    Kumatani, Kenichi
    Gmyr, Robert
    INTERSPEECH 2021, 2021, : 1334 - 1338
  • [45] Voice Conversion Based on Speaker-Dependent Restricted Boltzmann Machines
    Nakashika, Toru
    Takiguchi, Tetsuya
    Ariki, Yasuo
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (06): : 1403 - 1410
  • [46] Dynamic Model Selection for Spectral Voice Conversion
    Lanchantin, Pierre
    Rodet, Xavier
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1720 - 1723
  • [47] Speaker Anonymization: Disentangling Speaker Features from Pre-Trained Speech Embeddings for Voice Conversion
    Matassoni, Marco
    Fong, Seraphina
    Brutti, Alessio
    APPLIED SCIENCES-BASEL, 2024, 14 (09):
  • [48] Voice conversion: Wavelet based residual selection
    Kachare, Pramod
    Cheeran, Alice
    Nirmal, Jagganath
    Zaveri, Mukesh
    2015 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2015, : 1513 - 1518
  • [49] Warped source spectrum for voice conversion and similarity
    Shuang, Zhiwei
    Zhang, Shilei
    Qin, Yong
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2009, 49 (SUPPL. 1): : 1408 - 1412
  • [50] SVCGAN: Speaker Voice Conversion Generative Adversarial Network for Children's Speech Conversion and Recognition
    Xie, Chenghuan
    Zhou, Aimin
    JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (03) : 2182 - 2196