An Approach to Cross-Lingual Voice Conversion

被引:0
|
作者
Rallabandi, Sai Sirisha [1 ]
Gangashetty, Suryakanth V. [1 ]
机构
[1] Int Inst Informat Technol, Speech Proc Lab, Hyderabad, India
关键词
Deep Neural Networks; Cross-Lingual Voice Conversion; Scaled Exponential Linear Units; Mel Generalised Cepstral Coefficients; Auto-encoded speech;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The most prevalent multilingual Text-to-Speech (TTS) synthesis systems encounter an unnatural speaker shift at the language boundaries. This is observed when they are employed for code-mixed TTS synthesis. For the very fact that the collection of polyglot speech is non-trivial, many alternative approaches have been in focus. Cross-Lingual Voice Conversion (CLVC) has been one of those to generate speech with desired speaker and language identities. Our aim in this paper is to design a light-weighted CLVC framework between a pair of Mandarin-English speakers. CLVC is challenging when compared to traditional Voice Conversion (VC) because of its nature of accommodating unaligned corpus from the source and target speakers. We thus focus on generating a parallel corpus for CLVC and bridging the gap between speakers and languages. We perform a text-independent voice conversion with a three-layered conventional Neural Network (NN) for this purpose. The main contributions include i) Source similarity in both training and conversion stages of CLVC, ii) generation of a parallel corpus and iii) text independent and transcription free CLVC. We exploit two variants of a Neural Network in the proposed framework, i) an autoencoder to enable the source similarity and generation of parallel corpus, ii) a traditional DNN for feature mapping between the source and target. The subjective and objective evaluations show that the proposed method is indeed capable of performing a CLVC with an auto-encoded speech.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Spectrum and Prosody Conversion for Cross-lingual Voice Conversion with CycleGAN
    Du, Zongyang
    Zhou, Kun
    Sisman, Barrak
    Li, Haizhou
    [J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 507 - 513
  • [2] Frame Alignment Method for Cross-lingual Voice Conversion
    Erro, Daniel
    Moreno, Asuncion
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1533 - 1536
  • [3] ON THE STUDY OF GENERATIVE ADVERSARIAL NETWORKS FOR CROSS-LINGUAL VOICE CONVERSION
    Sisman, Berrak
    Zhang, Mingyang
    Dong, Minghui
    Li, Haizhou
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 144 - 151
  • [4] RefXVC: Cross-Lingual Voice Conversion With Enhanced Reference Leveraging
    Zhang, Mingyang
    Zhou, Yi
    Ren, Yi
    Zhang, Chen
    Yin, Xiang
    Li, Haizhou
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4146 - 4156
  • [5] Cross-lingual Voice Conversion with Disentangled Universal Linguistic Representations
    Yang, Zhenchuan
    Zhang, Weibin
    Liu, Yufei
    Xing, Xiaofen
    [J]. INTERSPEECH 2021, 2021, : 1604 - 1608
  • [6] CROSS-LINGUAL VOICE CONVERSION WITH BILINGUAL PHONETIC POSTERIORGRAM AND AVERAGE MODELING
    Zhou, Yi
    Tian, Xiaohai
    Xu, Haihua
    Das, Rohan Kumar
    Li, Haizhou
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6790 - 6794
  • [7] Cross-Lingual Voice Conversion with a Cycle Consistency Loss on Linguistic Representation
    Zhou, Yi
    Tian, Xiaohai
    Wu, Zhizheng
    Li, Haizhou
    [J]. INTERSPEECH 2021, 2021, : 1374 - 1378
  • [8] Multi-Task WaveRNN With an Integrated Architecture for Cross-Lingual Voice Conversion
    Zhou, Yi
    Tian, Xiaohai
    Li, Haizhou
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 1310 - 1314
  • [9] DNN-Based Cross-Lingual Voice Conversion Using Bottleneck Features
    M. Kiran Reddy
    K. Sreenivasa Rao
    [J]. Neural Processing Letters, 2020, 51 : 2029 - 2042
  • [10] DNN-Based Cross-Lingual Voice Conversion Using Bottleneck Features
    Reddy, M. Kiran
    Rao, K. Sreenivasa
    [J]. NEURAL PROCESSING LETTERS, 2020, 51 (02) : 2029 - 2042