Parallel Voice Conversion Based on a Continuous Sinusoidal Model

被引:0
|
作者
Al-Radhi, Mohammed Salah [1 ]
Csapo, Tamas Gabor [1 ]
Nemeth, Geza [1 ]
机构
[1] Budapest Univ Technol & Econ, Dept Telecommun & Media Informat, Budapest, Hungary
关键词
voice conversion; sinusoidal model; continuous F0; neural network; VOCODER;
D O I
10.1109/sped.2019.8906565
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The main challenge introduced in current voice conversion is the tradeoff between speaker similarity and computational complexity. To tackle the latter problems, this paper introduces a novel sinusoidal model applied for voice conversion (VC) with parallel training data. The conventional source-filter based techniques usually give sound quality and similarity degradation of the converted voice due to parameterization errors and over smoothing, which leads to a mismatch in the converted characteristics. Therefore, we developed a VC method using continuous sinusoidal model (CSM), which decomposes the source voice into harmonic components to improve VC performance. In contrast to current VC approaches, our method is motivated by two observations. Firstly, it allows continuous fundamental frequency (F0) to avoid alignment errors that may happen in voiced and unvoiced segments and can degrade the converted speech, that is important to maintain a high converted speech quality. We secondly compare our model with two high-quality modern (MagPhase and WORLD) vocoders applied for VC, and one with a vocoder-free VC framework based on a differential Gaussian mixture model that was used recently for the Voice Conversion Challenge 2018. Similarity and intelligibility are finally evaluated in objective and subjective measures. Experimental results confirmed that the proposed method obtained higher speaker similarity compared to the conventional methods.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] Parallel vs. Non-parallel Voice Conversion for Esophageal Speech
    Serrano, Luis
    Raman, Sneha
    Tavarez, David
    Navas, Eva
    Hernaez, Inma
    INTERSPEECH 2019, 2019, : 4549 - 4553
  • [32] Voice conversion using Viterbi algorithm based on Gaussian mixture model
    Jian Zhi-Hua
    Yang Zhen
    2007 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS, VOLS 1 AND 2, 2007, : 40 - 43
  • [33] Voice Conversion Based on State Space Model and Considering Global Variance
    Ahangar, Mohsen
    Ghorbandoost, Mostafa
    Sheikhzadeh, Hamid
    Raahemifar, Kaamran
    Shahrebabaki, Abdoreza Sabzi
    Amini, Jamal
    2013 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (IEEE ISSPIT 2013), 2013, : 416 - 421
  • [34] A singing voice synthesis system based on sinusoidal modeling
    Macon, MW
    JensenLink, L
    Oliverio, J
    Clements, MA
    George, EB
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 435 - 438
  • [35] Additive synthesis based on the continuous wavelet transform:: A sinusoidal plus transient model
    Beltrán, JR
    Beltrán, F
    DAFX-03: 6TH INTERNATIONAL CONFERENCE ON DIGITAL AUDIO EFFECTS, PROCEEDINGS, 2003, : 123 - 128
  • [36] Voice Conversion Based on Unified Dictionary with Clustered Features Between Non-parallel Corpus
    Jin, Hui
    Yu, Yi-Biao
    2018 4TH ANNUAL INTERNATIONAL CONFERENCE ON NETWORK AND INFORMATION SYSTEMS FOR COMPUTERS (ICNISC 2018), 2018, : 229 - 232
  • [37] A KL Divergence and DNN-based Approach to Voice Conversion without Parallel Training Sentences
    Xie, Feng-Long
    Soong, Frank K.
    Li, Haifeng
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 287 - 291
  • [38] Transferring Source Style in Non-Parallel Voice Conversion
    Liu, Songxiang
    Cao, Yuewen
    Kang, Shiyin
    Hu, Na
    Liu, Xunying
    Su, Dan
    Yu, Dong
    Meng, Helen
    INTERSPEECH 2020, 2020, : 4721 - 4725
  • [39] Frame Labeling and Mapping for Non-parallel Voice Conversion
    Dong, Minghui
    Yang, Chenyu
    Ehnes, Jochen Walter
    Lu, Yanfeng
    Ming, Huaiping
    Huang, Dongyan
    2017 IEEE 2ND INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP), 2017, : 361 - 365
  • [40] Non-parallel Voice Conversion with Generative Attentional Networks
    Chiu, Tse Wei
    Guo, You Sheng
    Chang, Pao-Chi
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 141 - 145