On the Use of I-vectors and Average Voice Model for Voice Conversion without Parallel Data

被引:3
|
作者
Wu, Jie [1 ]
Wu, Zhizheng [2 ]
Xie, Lei [1 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Shaanxi Prov Key Lab Speech & Image Informat Proc, Xian, Peoples R China
[2] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9YL, Midlothian, Scotland
基金
美国国家科学基金会;
关键词
voice conversion; nonparallel training; average voice model; i-vector; long short-term memory;
D O I
10.1109/APSIPA.2016.7820901
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Recently, deep and/or recurrent neural networks (DNNs/RNNs) have been employed for voice conversion, and have significantly improved the performance of converted speech. However, DNNs/RNNs generally require a large amount of parallel training data (e.g., hundreds of utterances) from source and target speakers. It is expensive to collect such a large amount of data, and impossible in some applications, such as cross-lingual conversion. To solve this problem, we propose to use average voice model and i-vectors for long short-term memory (LSTM) based voice conversion, which does not require parallel data from source and target speakers. The average voice model is trained using other speakers' data, and the i-vectors, a compact vector representing the identities of source and target speakers, are extracted independently. Subjective evaluation has confirmed the effectiveness of the proposed approach.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Parallel data free singing voice conversion with cycle-consistent BEGAN
    Yousuf, Assila
    George, David Solomon
    [J]. MATERIALS TODAY-PROCEEDINGS, 2022, 58 : 157 - 161
  • [22] A Speaker-Dependent WaveNet for Voice Conversion with Non-Parallel Data
    Tian, Xiaohai
    Chng, Eng Siong
    Li, Haizhou
    [J]. INTERSPEECH 2019, 2019, : 201 - 205
  • [23] Frame Selection in SI-DNN Phonetic Space with WaveNet Vocoder for Voice Conversion without Parallel Training Data
    Xie, Feng-Long
    Soong, Frank K.
    Wang, Xi
    He, Lei
    Li, Haifeng
    [J]. 2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 56 - 60
  • [24] Effects of Sinusoidal Model on Non-Parallel Voice Conversion with Adversarial Learning
    Al-Radhi, Mohammed Salah
    Csapo, Tamas Gabor
    Nemeth, Geza
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (16):
  • [25] A novel approach to remove outliers for parallel voice conversion
    Shah, Nirmesh J.
    Patil, Hemant A.
    [J]. COMPUTER SPEECH AND LANGUAGE, 2019, 58 : 127 - 152
  • [26] Non-Parallel Voice Conversion for ASR Augmentation
    Wang, Gary
    Rosenberg, Andrew
    Ramabhadran, Bhuvana
    Biadsy, Fadi
    Huang, Yinghui
    Emond, Jesse
    Mengibar, Pedro Moreno
    [J]. INTERSPEECH 2022, 2022, : 3408 - 3412
  • [27] An Improved StarGAN for Emotional Voice Conversion: Enhancing Voice Quality and Data Augmentation
    He, Xiangheng
    Chen, Junjie
    Rizos, Georgios
    Schuller, Bjorn W.
    [J]. INTERSPEECH 2021, 2021, : 821 - 825
  • [28] VAW-GAN for Singing Voice Conversion with Non-parallel Training Data
    Lu, Junchen
    Zhou, Kun
    Sisman, Berrak
    Li, Haizhou
    [J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 514 - 519
  • [29] A KL Divergence and DNN-based Approach to Voice Conversion without Parallel Training Sentences
    Xie, Feng-Long
    Soong, Frank K.
    Li, Haifeng
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 287 - 291
  • [30] CROSS-GENDER VOICE CONVERSION WITH CONSTANT F0-RATIO AND AVERAGE BACKGROUND CONVERSION MODEL
    Latka, Zbigniew
    Galka, Jakub
    Ziolko, Bartosz
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6825 - 6829