Cross-Lingual Voice Conversion using a Cyclic Variational Auto-encoder and a WaveNet Vocoder

被引:0
|
作者
Nakatani, Hikaru [1 ]
Tobing, Patrick Lumban [1 ]
Takeda, Kazuya [1 ]
Toda, Tomoki [1 ]
机构
[1] Nagoya Univ, Nagoya, Aichi, Japan
关键词
NEURAL-NETWORKS; SPEECH;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a novel, cross-lingual voice conversion (VC) method using a cyclic variational auto-encoder (CycleVAE). Voice conversion is the transformation of the voice of one speaker into the voice of another speaker, while cross-lingual VC performs voice conversion between speakers who speak different languages. When using VC methods based on parallel learning, it is necessary to prepare accented speech uttered by the source or target speaker, using the pronunciation system of the speaker's mother tongue. On the other hand, VC methods which use a non-parallel learning approach can utilize the natural speech data of both the source and target speakers, produced in their own native languages. It then becomes necessary, however, to deal with the issues of time-alignment and language mismatches. To address these issues, we apply CycleVAE to cross-lingual VC as a sophisticated, non-parallel method of VC. We also apply the WaveNet vocoder in the waveform generation process of CycleVAE-VC to improve overall conversion quality. Our objective and subjective experimental results when performing cross-lingual VC from a native English speaker to a native Japanese speaker confirm that the proposed method achieves a higher level of naturalness and speaker similarity than a conventional RNN-based parallel VC method using accented speech.
引用
收藏
页码:520 / 526
页数:7
相关论文
共 50 条
  • [31] Tire Pattern Image Classification using Variational Auto-Encoder with Contrastive Learning
    Yang, Jianning
    Xue, Jiahao
    Feng, Xiaodong
    Song, Chaoqi
    Hao, Yu
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2022,
  • [32] Pairwise Context Similarity for Image Retrieval System Using Variational Auto-Encoder
    Yun, Hyeongu
    Kim, Yongil
    Kang, Taegwan
    Jung, Kyomin
    [J]. IEEE ACCESS, 2021, 9 : 34067 - 34077
  • [33] DISENTANGLED SPEECH REPRESENTATION LEARNING FOR ONE-SHOT CROSS-LINGUAL VOICE CONVERSION USING β-VAE
    Lu, Hui
    Wang, Disong
    Wu, Xixin
    Wu, Zhiyong
    Liu, Xunying
    Meng, Helen
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 814 - 821
  • [34] Deep Representation Learning for Code Smells Detection using Variational Auto-Encoder
    Hadj-Kacem, Mouna
    Bouassida, Nadia
    [J]. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [35] Modeling and augmenting of fMRI data using deep recurrent variational auto-encoder
    Qiang, Ning
    Dong, Qinglin
    Liang, Hongtao
    Ge, Bao
    Zhang, Shu
    Sun, Yifei
    Zhang, Cheng
    Zhang, Wei
    Gao, Jie
    Liu, Tianming
    [J]. JOURNAL OF NEURAL ENGINEERING, 2021, 18 (04)
  • [36] Voice Conversion Based on Cross-Domain Features Using Variational Auto Encoders
    Huang, Wen-Chin
    Hwang, Hsin-Te
    Peng, Yu-Huai
    Tsao, Yu
    Wang, Hsin-Min
    [J]. 2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 51 - 55
  • [37] Exploring Cross-lingual Singing Voice Synthesis Using Speech Data
    Cao, Yuewen
    Liu, Songxiang
    Kang, Shiyin
    Hu, Na
    Liu, Peng
    Liu, Xunying
    Su, Dan
    Yu, Dong
    Meng, Helen
    [J]. 2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [38] Cross-Lingual Voice Conversion-Based Polyglot Speech Synthesizer for Indian Languages
    Ramani, B.
    Jeeva, Actlin M. P.
    Vijayalakshmi, P.
    Nagarajan, T.
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 775 - 779
  • [39] A New HMM-Based Voice Conversion Methodology Evaluated on Monolingual and Cross-Lingual Conversion Tasks
    Percybrooks, Winston S.
    Moore, Elliot
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (12) : 2298 - 2310
  • [40] A new HMM-based voice conversion methodology evaluated on monolingual and cross-lingual conversion tasks
    Percybrooks, Winston S.
    Moore, Elliot
    [J]. IEEE Transactions on Audio, Speech and Language Processing, 2015, 23 (12): : 2298 - 2310