Automatic source speaker selection for voice conversion

被引:0
|
作者
Turk, Oytun [1 ]
Arslan, Levent M. [2 ]
机构
[1] Bogazici Univ, Dept Elect & Elect Engn, TR-34342 Istanbul, Turkey
[2] Sestek Inc, R&D Dept, TR-34342 Istanbul, Turkey
来源
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA | 2009年 / 125卷 / 01期
关键词
hearing; learning (artificial intelligence); neural nets; regression analysis; speaker recognition; speech coding; PROCESSING TECHNIQUES; TRANSFORMATION; QUALITY;
D O I
10.1121/1.3027445
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper focuses on the importance of source speaker selection for a weighted codebook mapping based voice conversion algorithm. First, the dependency on source speakers is evaluated in a subjective listening test using 180 different source-target pairs from a database of 20 speakers. Subjective scores for similarity to target speaker's voice and quality are obtained. Statistical analysis of scores confirms the dependence of performance on source speakers for both male-to-male and female-to-female transformations. A source speaker selection algorithm is devised given a target speaker and a set of source speaker candidates. For this purpose, an artificial neural network (ANN) is trained that learns the regression between a set of acoustical distance measures and the subjective scores. The estimated scores are used in source speaker ranking. The average cross-correlation coefficient between rankings obtained from median subjective scores and rankings estimated by the algorithm is 0.84 for similarity and 0.78 for quality in male-to-male transformations. The results for female-to-female transformations were less reliable with a cross-correlation value of 0.58 for both similarity and quality.
引用
收藏
页码:480 / 491
页数:12
相关论文
共 50 条
  • [31] Wavelet Analysis of Speaker Dependent and Independent Prosody for Voice Conversion
    Sisman, Berrak
    Li, Haizhou
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 52 - 56
  • [32] One-shot Voice Conversion with Global Speaker Embeddings
    Lu, Hui
    Wu, Zhiyong
    Dai, Dongyang
    Li, Runnan
    Kang, Shiyin
    Jia, Jia
    Meng, Helen
    INTERSPEECH 2019, 2019, : 669 - 673
  • [33] Robust Threshold Selection for Environment Specific Voice in Speaker Recognition
    Soumen Kanrar
    Wireless Personal Communications, 2022, 126 : 3071 - 3092
  • [34] Robust Threshold Selection for Environment Specific Voice in Speaker Recognition
    Kanrar, Soumen
    WIRELESS PERSONAL COMMUNICATIONS, 2022, 126 (04) : 3071 - 3092
  • [35] Probabilistic Integration of Joint Density Model and Speaker Model for Voice Conversion
    Saito, Daisuke
    Watanabe, Shinji
    Nakamura, Atsushi
    Minematsu, Nobuaki
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1728 - +
  • [36] Speaker-Independent Emotional Voice Conversion via Disentangled Representations
    Chen, Xunquan
    Xu, Xuexin
    Chen, Jinhui
    Zhang, Zhizhong
    Takiguchi, Tetsuya
    Hancock, Edwin R.
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7480 - 7493
  • [37] OPTIMIZING VOICE CONVERSION NETWORK WITH CYCLE CONSISTENCY LOSS OF SPEAKER IDENTITY
    Du, Hongqiang
    Tian, Xiaohai
    Xie, Lei
    Li, Haizhou
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 507 - 513
  • [38] Voice Conversion for TTS Systems with Tuning on the Target Speaker Based on GMM
    Zahariev, Vadim
    Azarov, Elias
    Petrovsky, Alexander
    SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 788 - 798
  • [39] ONE-SHOT VOICE CONVERSION BASED ON SPEAKER AWARE MODULE
    Zhang, Ying
    Che, Hao
    Li, Jie
    Li, Chenxing
    Wang, Xiaorui
    Wang, Zhongyuan
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5959 - 5963
  • [40] Who is Speaking Actually? Robust and Versatile Speaker Traceability for Voice Conversion
    Ren, Yanzhen
    Zhu, Hongcheng
    Zhai, Liming
    Sun, Zongkun
    Shen, Rubing
    Wang, Lina
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 8674 - 8685