Parallel Voice Conversion Based on a Continuous Sinusoidal Model

被引:0
|
作者
Al-Radhi, Mohammed Salah [1 ]
Csapo, Tamas Gabor [1 ]
Nemeth, Geza [1 ]
机构
[1] Budapest Univ Technol & Econ, Dept Telecommun & Media Informat, Budapest, Hungary
关键词
voice conversion; sinusoidal model; continuous F0; neural network; VOCODER;
D O I
10.1109/sped.2019.8906565
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The main challenge introduced in current voice conversion is the tradeoff between speaker similarity and computational complexity. To tackle the latter problems, this paper introduces a novel sinusoidal model applied for voice conversion (VC) with parallel training data. The conventional source-filter based techniques usually give sound quality and similarity degradation of the converted voice due to parameterization errors and over smoothing, which leads to a mismatch in the converted characteristics. Therefore, we developed a VC method using continuous sinusoidal model (CSM), which decomposes the source voice into harmonic components to improve VC performance. In contrast to current VC approaches, our method is motivated by two observations. Firstly, it allows continuous fundamental frequency (F0) to avoid alignment errors that may happen in voiced and unvoiced segments and can degrade the converted speech, that is important to maintain a high converted speech quality. We secondly compare our model with two high-quality modern (MagPhase and WORLD) vocoders applied for VC, and one with a vocoder-free VC framework based on a differential Gaussian mixture model that was used recently for the Voice Conversion Challenge 2018. Similarity and intelligibility are finally evaluated in objective and subjective measures. Experimental results confirmed that the proposed method obtained higher speaker similarity compared to the conventional methods.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Effects of Sinusoidal Model on Non-Parallel Voice Conversion with Adversarial Learning
    Al-Radhi, Mohammed Salah
    Csapo, Tamas Gabor
    Nemeth, Geza
    APPLIED SCIENCES-BASEL, 2021, 11 (16):
  • [2] SPEAKER ADAPTIVE MODEL BASED ON BOLTZMANN MACHINE FOR NON-PARALLEL TRAINING IN VOICE CONVERSION
    Nakashika, Torsi
    Minami, Yasuhiro
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5530 - 5534
  • [3] On the Use of I-vectors and Average Voice Model for Voice Conversion without Parallel Data
    Wu, Jie
    Wu, Zhizheng
    Xie, Lei
    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [4] Voice Conversion based on Continuous Frequency Warping and Magnitude Scaling
    Ye, Yuhang
    Lawlor, Bob
    2017 28TH IRISH SIGNALS AND SYSTEMS CONFERENCE (ISSC), 2017,
  • [5] NON-PARALLEL TRAINING FOR VOICE CONVERSION BASED ON ADAPTATION METHOD
    Song, Peng
    Zheng, Wenming
    Zhao, Li
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6905 - 6909
  • [6] A novel method for voice conversion based on non-parallel corpus
    Sayadian A.
    Mozaffari F.
    International Journal of Speech Technology, 2017, 20 (3) : 587 - 592
  • [7] Continuous probabilistic transform for voice conversion
    Stylianou, Y
    Cappe, O
    Moulines, E
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (02): : 131 - 142
  • [8] Continuous vocoder applied in deep neural network based voice conversion
    Al-Radhi, Mohammed Salah
    Csapo, Tamas Gabor
    Nemeth, Geza
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (23) : 33549 - 33572
  • [9] Continuous vocoder applied in deep neural network based voice conversion
    Mohammed Salah Al-Radhi
    Tamás Gábor Csapó
    Géza Németh
    Multimedia Tools and Applications, 2019, 78 : 33549 - 33572
  • [10] Statistical Voice Conversion Based on Noisy Channel Model
    Saito, Daisuke
    Watanabe, Shinji
    Nakamura, Atsushi
    Minematsu, Nobuaki
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (06): : 1784 - 1794