On the Use of I-vectors and Average Voice Model for Voice Conversion without Parallel Data

被引:3
|
作者
Wu, Jie [1 ]
Wu, Zhizheng [2 ]
Xie, Lei [1 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Shaanxi Prov Key Lab Speech & Image Informat Proc, Xian, Peoples R China
[2] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9YL, Midlothian, Scotland
基金
美国国家科学基金会;
关键词
voice conversion; nonparallel training; average voice model; i-vector; long short-term memory;
D O I
10.1109/APSIPA.2016.7820901
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Recently, deep and/or recurrent neural networks (DNNs/RNNs) have been employed for voice conversion, and have significantly improved the performance of converted speech. However, DNNs/RNNs generally require a large amount of parallel training data (e.g., hundreds of utterances) from source and target speakers. It is expensive to collect such a large amount of data, and impossible in some applications, such as cross-lingual conversion. To solve this problem, we propose to use average voice model and i-vectors for long short-term memory (LSTM) based voice conversion, which does not require parallel data from source and target speakers. The average voice model is trained using other speakers' data, and the i-vectors, a compact vector representing the identities of source and target speakers, are extracted independently. Subjective evaluation has confirmed the effectiveness of the proposed approach.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] VOICE VERIFICATION USING I-VECTORS AND NEURAL NETWORKS WITH LIMITED TRAINING DATA
    Mamyrbayev, O. Zh.
    Othman, M.
    Akhmediyarova, A. T.
    Kydyrbekova, A. S.
    Mekebayev, N. O.
    [J]. BULLETIN OF THE NATIONAL ACADEMY OF SCIENCES OF THE REPUBLIC OF KAZAKHSTAN, 2019, (03): : 36 - 43
  • [2] IDENTIFICATION OF VOICE QUALITY VARIATION USING I-VECTORS
    Feng, Chuyao
    van Leer, Eva
    Anderson, David V.
    [J]. 2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2019, : 105 - 109
  • [3] Spectral Mapping Using Prior Re-Estimation of i-Vectors and System Fusion for Voice Conversion
    Pal, Monisankha
    Saha, Goutam
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (11) : 2071 - 2084
  • [4] SPARSE REPRESENTATION OF PHONETIC FEATURES FOR VOICE CONVERSION WITH AND WITHOUT PARALLEL DATA
    Sisman, Berrak
    Li, Haizhou
    Tan, Kay Chen
    [J]. 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 677 - 684
  • [5] Using i-vectors from voice features to identify major depressive disorder
    Di, Yazheng
    Wang, Jingying
    Li, Weidong
    Zhu, Tingshao
    [J]. JOURNAL OF AFFECTIVE DISORDERS, 2021, 288 : 161 - 166
  • [6] I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry
    Hautamaki, Rosa Gonzalez
    Kinnunen, Tomi
    Hautamaki, Ville
    Leino, Timo
    Laukkanen, Anne-Maria
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 930 - 934
  • [7] ARVC: An Auto-Regressive Voice Conversion System Without Parallel Training Data
    Lian, Zheng
    Wen, Zhengqi
    Zhou, Xinyong
    Pu, Songbai
    Zhang, Shengkai
    Tao, Jianhua
    [J]. INTERSPEECH 2020, 2020, : 4706 - 4710
  • [8] PHONETIC POSTERIORGRAMS FOR MANY-TO-ONE VOICE CONVERSION WITHOUT PARALLEL DATA TRAINING
    Sun, Lifa
    Li, Kun
    Wang, Hao
    Kang, Shiyin
    Meng, Helen
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO (ICME), 2016,
  • [9] SINGING VOICE CONVERSION WITH NON-PARALLEL DATA
    Chen, Xin
    Chu, Wei
    Guo, Jinxi
    Xu, Ning
    [J]. 2019 2ND IEEE CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2019), 2019, : 292 - 296
  • [10] DeepConversion: Voice conversion with limited parallel training data
    Zhang, Mingyang
    Sisman, Berrak
    Zhao, Li
    Li, Haizhou
    [J]. SPEECH COMMUNICATION, 2020, 122 : 31 - 43