Voice Conversion without Parallel Speech Corpus Based on Mixtures of Linear Transform

被引:1
|
作者
Jian, Zhi-Hua [1 ]
Yang, Zhen [1 ]
机构
[1] Nanjing Univ Post & Telecommun, Inst Signal Proc & Transmiss, Nanjing, Peoples R China
关键词
Voice conversion; multimedia application; Ms-LT; EM algorithm;
D O I
10.1109/WICOM.2007.701
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper presents an algorithm for voice conversion based on mixtures of linear transform (Ms-LT) which avoids the need for parallel training data inherent in conventional approaches. In maximum likelihood framework, the EM algorithm is used to compute the parameters of the conversion function. And the chirp z-transform is utilized to enhance the averaged spectral envelop due to the linear weighting. The proposed voice conversion system is evaluated using both objective and subjective measures. The experimental results demonstrate that our approach is capable of effectively transforming speaker identity and can achieve comparable results of the conventional methods where a parallel corpus exists.
引用
收藏
页码:2825 / 2828
页数:4
相关论文
共 50 条
  • [1] A novel method for voice conversion based on non-parallel corpus
    Sayadian A.
    Mozaffari F.
    International Journal of Speech Technology, 2017, 20 (3) : 587 - 592
  • [2] GAZEV: GAN-Based Zero-Shot Voice Conversion over Non-parallel Speech Corpus
    Zhang, Zining
    He, Bingsheng
    Zhang, Zhenjie
    INTERSPEECH 2020, 2020, : 791 - 795
  • [3] Russian Speech Conversion Algorithm Based on a Parallel Corpus and Machine Translation
    Zhang, Yingyi
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
  • [4] Deep Neural Network based Voice Conversion with A Large Synthesized Parallel Corpus
    Wen, Zhengqi
    Li, Kehuang
    Tao, Jianhua
    Lee, Chin-Hui
    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [5] Parallel-data-free Many-to-many Voice Conversion based on DNN Integrated with Eigenspace Using a Non-parallel Speech Corpus
    Hashimoto, Tetsuya
    Uchida, Hidetsugu
    Saito, Daisuke
    Minematsu, Nobuaki
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1278 - 1282
  • [6] Voice Conversion Based on Unified Dictionary with Clustered Features Between Non-parallel Corpus
    Jin, Hui
    Yu, Yi-Biao
    2018 4TH ANNUAL INTERNATIONAL CONFERENCE ON NETWORK AND INFORMATION SYSTEMS FOR COMPUTERS (ICNISC 2018), 2018, : 229 - 232
  • [7] Parallel vs. Non-parallel Voice Conversion for Esophageal Speech
    Serrano, Luis
    Raman, Sneha
    Tavarez, David
    Navas, Eva
    Hernaez, Inma
    INTERSPEECH 2019, 2019, : 4549 - 4553
  • [8] Many-to-many voice conversion experiments using a Korean speech corpus
    Yook, Dongsuk
    Seo, HyungJin
    Ko, Bonggu
    Yoo, In-Chul
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2022, 41 (03): : 351 - 358
  • [9] Voice Conversion Based on Mixtures of Factor Analyzers
    Uto, Yosuke
    Nankaku, Yoshihiko
    Toda, Tomoki
    Lee, Akinobu
    Tokuda, Keiichi
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2278 - +
  • [10] Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion without Parallel Data
    Park, Seung-won
    Kim, Doo-young
    Joe, Myun-chul
    INTERSPEECH 2020, 2020, : 4696 - 4700