Parallel Voice Conversion Based on a Continuous Sinusoidal Model

被引:0
|
作者
Al-Radhi, Mohammed Salah [1 ]
Csapo, Tamas Gabor [1 ]
Nemeth, Geza [1 ]
机构
[1] Budapest Univ Technol & Econ, Dept Telecommun & Media Informat, Budapest, Hungary
关键词
voice conversion; sinusoidal model; continuous F0; neural network; VOCODER;
D O I
10.1109/sped.2019.8906565
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The main challenge introduced in current voice conversion is the tradeoff between speaker similarity and computational complexity. To tackle the latter problems, this paper introduces a novel sinusoidal model applied for voice conversion (VC) with parallel training data. The conventional source-filter based techniques usually give sound quality and similarity degradation of the converted voice due to parameterization errors and over smoothing, which leads to a mismatch in the converted characteristics. Therefore, we developed a VC method using continuous sinusoidal model (CSM), which decomposes the source voice into harmonic components to improve VC performance. In contrast to current VC approaches, our method is motivated by two observations. Firstly, it allows continuous fundamental frequency (F0) to avoid alignment errors that may happen in voiced and unvoiced segments and can degrade the converted speech, that is important to maintain a high converted speech quality. We secondly compare our model with two high-quality modern (MagPhase and WORLD) vocoders applied for VC, and one with a vocoder-free VC framework based on a differential Gaussian mixture model that was used recently for the Voice Conversion Challenge 2018. Similarity and intelligibility are finally evaluated in objective and subjective measures. Experimental results confirmed that the proposed method obtained higher speaker similarity compared to the conventional methods.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] Non-Parallel Voice Conversion with Cyclic Variational Autoencoder
    Tobing, Patrick Lumban
    Wu, Yi-Chiao
    Hayashi, Tomoki
    Kobayashi, Kazuhiro
    Toda, Tomoki
    INTERSPEECH 2019, 2019, : 674 - 678
  • [42] NOVEL METRIC LEARNING FOR NON-PARALLEL VOICE CONVERSION
    Shah, Nirmesh J.
    Patil, Hemant A.
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3722 - 3726
  • [43] CVC: Contrastive Learning for Non-parallel Voice Conversion
    Li, Tingle
    Liu, Yichen
    Hu, Chenxu
    Zhao, Hang
    INTERSPEECH 2021, 2021, : 1324 - 1328
  • [44] A Novel Iterative Speaker Model Alignment Method from Non-Parallel Speech for Voice Conversion
    Song, Peng
    Zheng, Wenming
    Zhang, Xinran
    Jin, Yun
    Zha, Cheng
    Xin, Minghai
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2015, E98A (10) : 2178 - 2181
  • [45] Investigation of Text-to-Speech-based Synthetic Parallel Data for Sequence-to-Sequence Non-Parallel Voice Conversion
    Ma, Ding
    Huang, Wen-Chin
    Toda, Tomoki
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 870 - 877
  • [46] STATISTICAL VOICE CONVERSION BASED ON WAVENET
    Niwa, Jumpei
    Yoshimura, Takenori
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5289 - 5293
  • [47] VTLN-based voice conversion
    Sündermann, D
    Ney, H
    PROCEEDINGS OF THE 3RD IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY, 2003, : 556 - 559
  • [48] Jointly Trained Conversion Model and WaveNet Vocoder for Non-parallel Voice Conversion using Mel-spectrograms and Phonetic Posteriorgrams
    Liu, Songxiang
    Cao, Yuewen
    Wu, Xixin
    Sun, Lifa
    Liu, Xunying
    Meng, Helen
    INTERSPEECH 2019, 2019, : 714 - 718
  • [49] Controllable voice conversion based on quantization of voice factor scores
    Isako, Takumi
    Onishi, Kotaro
    Kishida, Takuya
    Nakashika, Toru
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1444 - 1448
  • [50] A Bayesian Approach to Voice Conversion Based on GMMs Using Multiple Model Structures
    Li, Lei
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 668 - 671