Automatic source speaker selection for voice conversion

被引:0
|
作者
Turk, Oytun [1 ]
Arslan, Levent M. [2 ]
机构
[1] Bogazici Univ, Dept Elect & Elect Engn, TR-34342 Istanbul, Turkey
[2] Sestek Inc, R&D Dept, TR-34342 Istanbul, Turkey
来源
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA | 2009年 / 125卷 / 01期
关键词
hearing; learning (artificial intelligence); neural nets; regression analysis; speaker recognition; speech coding; PROCESSING TECHNIQUES; TRANSFORMATION; QUALITY;
D O I
10.1121/1.3027445
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper focuses on the importance of source speaker selection for a weighted codebook mapping based voice conversion algorithm. First, the dependency on source speakers is evaluated in a subjective listening test using 180 different source-target pairs from a database of 20 speakers. Subjective scores for similarity to target speaker's voice and quality are obtained. Statistical analysis of scores confirms the dependence of performance on source speakers for both male-to-male and female-to-female transformations. A source speaker selection algorithm is devised given a target speaker and a set of source speaker candidates. For this purpose, an artificial neural network (ANN) is trained that learns the regression between a set of acoustical distance measures and the subjective scores. The estimated scores are used in source speaker ranking. The average cross-correlation coefficient between rankings obtained from median subjective scores and rankings estimated by the algorithm is 0.84 for similarity and 0.78 for quality in male-to-male transformations. The results for female-to-female transformations were less reliable with a cross-correlation value of 0.58 for both similarity and quality.
引用
收藏
页码:480 / 491
页数:12
相关论文
共 50 条
  • [21] Conversion function clustering and selection for expressive voice conversion
    Hsia, Chi-Chun
    Wu, Chung-Hsien
    Wu, Jian-Qi
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 689 - +
  • [22] Automatic versus human speaker verification: The case of voice mimicry
    Hautamaki, Rosa Gonzalez
    Kinnunen, Tomi
    Hautamaki, Ville
    Laukkanen, Anne-Maria
    SPEECH COMMUNICATION, 2015, 72 : 13 - 31
  • [23] Influence of Natural Voice Disguise Techniques on Automatic Speaker Recognition
    Staroniewicz, Piotr
    2018 JOINT CONFERENCE - ACOUSTICS, 2018, : 299 - 302
  • [24] EXEMPLAR SELECTION METHODS IN VOICE CONVERSION
    Zhao, Guanlong
    Gutierrez-Osuna, Ricardo
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5525 - 5529
  • [25] Voice Cloning and Mismatch Conditions in Forensic Automatic Speaker Recognition
    Kudera, Jacek
    Coccia, Miriam
    Fadaeijouybari, Sharifeh
    Preidt, Till
    Ranjan, Akshay
    Braun, Angelika
    SPEECH AND COMPUTER, SPECOM 2024, PT II, 2025, 15300 : 171 - 184
  • [26] Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion
    Du, Zongyang
    Sisman, Berrak
    Zhou, Kun
    Li, Haizhou
    INTERSPEECH 2022, 2022, : 2603 - 2607
  • [27] Voice Conversion Attacks on Speaker De-Identification Schemes
    He, Shufei
    Roelse, Peter
    2020 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2020, : 310 - 315
  • [28] GLOTTAL SOURCE MODELING FOR VOICE CONVERSION
    CHILDERS, DG
    SPEECH COMMUNICATION, 1995, 16 (02) : 127 - 138
  • [29] Voice conversion with UBM and speaker-specific model adaptation
    Zhu, Chunlei
    Yu, Yibiao
    PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 553 - 556
  • [30] Accent and Speaker Disentanglement in Many-to-many Voice Conversion
    Wang, Zhichao
    Ge, Wenshuo
    Wang, Xiong
    Yang, Shan
    Gan, Wendong
    Chen, Haitao
    Li, Hai
    Xie, Lei
    Li, Xiulin
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,