Continuous vocoder applied in deep neural network based voice conversion

被引:0
|
作者
Mohammed Salah Al-Radhi
Tamás Gábor Csapó
Géza Németh
机构
[1] Budapest University of Technology and Economics,Department of Telecommunications and Media Informatics
[2] MTA-ELTE Lendület Lingual Articulation Research Group,undefined
来源
关键词
Voice conversion; Continuous vocoder,∙ neural network; Speech synthesis;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, a novel vocoder is proposed for a Statistical Voice Conversion (SVC) framework using deep neural network, where multiple features from the speech of two speakers (source and target) are converted acoustically. Traditional conversion methods focus on the prosodic feature represented by the discontinuous fundamental frequency (F0) and the spectral envelope. Studies have shown that speech analysis/synthesis solutions play an important role in the overall quality of the converted voice. Recently, we have proposed a new continuous vocoder, originally for statistical parametric speech synthesis, in which all parameters are continuous. Therefore, this work introduces a new method by using a continuous F0 (contF0) in SVC to avoid alignment errors that may happen in voiced and unvoiced segments and can degrade the converted speech. Our contribution includes the following. (1) We integrate into the SVC framework the continuous vocoder, which provides an advanced model of the excitation signal, by converting its contF0, maximum voiced frequency, and spectral features. (2) We show that the feed-forward deep neural network (FF-DNN) using our vocoder yields high quality conversion. (3) We apply a geometric approach to spectral subtraction (GA-SS) in the final stage of the proposed framework, to improve the signal-to-noise ratio of the converted speech. Our experimental results, using two male and one female speakers, have shown that the resulting converted speech with the proposed SVC technique is similar to the target speaker and gives state-of-the-art performance as measured by objective evaluation and subjective listening tests.
引用
收藏
页码:33549 / 33572
页数:23
相关论文
共 50 条
  • [21] Voice conversion with pitch alteration using phase vocoder
    Lenarczyk, Michal
    Janicki, Artur
    [J]. 2017 SIGNAL PROCESSING: ALGORITHMS, ARCHITECTURES, ARRANGEMENTS, AND APPLICATIONS (SPA 2017), 2017, : 73 - 77
  • [22] Person identification based on voice biometric using deep neural network
    AL-Shakarchy N.D.
    Obayes H.K.
    Abdullah Z.N.
    [J]. International Journal of Information Technology, 2023, 15 (2) : 789 - 795
  • [23] Pathological Voice Recognition by Deep Neural Network
    Zhang, Xiaojun
    Tao, Zhi
    Zhao, Heming
    Xu, Tianqi
    [J]. 2017 4TH INTERNATIONAL CONFERENCE ON SYSTEMS AND INFORMATICS (ICSAI), 2017, : 464 - 468
  • [24] Voice Conversion With CycleRNN-Based Spectral Mapping and Finely Tuned WaveNet Vocoder
    Tobing, Patrick Lumban
    Wu, Yi-Chiao
    Hayashi, Tomoki
    Kobayashi, Kazuhiro
    Toda, Tomoki
    [J]. IEEE ACCESS, 2019, 7 : 171114 - 171125
  • [25] High-quality Voice Conversion Using Spectrogram-Based WaveNet Vocoder
    Chen, Kuan
    Chen, Bo
    Lai, Jiahao
    Yu, Kai
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1993 - 1997
  • [26] Voice Conversion Using Coefficient Mapping and Neural Network
    Ayodeji, Agbolade Olaide
    Oyetunji, S. A.
    [J]. 2016 INTERNATIONAL CONFERENCE FOR STUDENTS ON APPLIED ENGINEERING (ICSAE), 2016, : 479 - 483
  • [27] Voice conversion using General Regression Neural Network
    Nirmal, Jagannath
    Zaveri, Mukesh
    Patnaik, Suprava
    Kachare, Pramod
    [J]. APPLIED SOFT COMPUTING, 2014, 24 : 1 - 12
  • [28] Voice Conversion Based on Deep Neural Networks for Time-Variant Linear Transformations
    Kotani, Gaku
    Saito, Daisuke
    Minematsu, Nobuaki
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2981 - 2992
  • [29] Voice Conversion from Arbitrary Speakers Based on Deep Neural Networks with Adversarial Learning
    Miyamoto, Sou
    Nose, Takashi
    Ito, Suzunosuke
    Koike, Harunori
    Chiba, Yuya
    Ito, Akinori
    Shinozaki, Takahiro
    [J]. ADVANCES IN INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, PT II, 2018, 82 : 97 - 103
  • [30] Voice conversion based on deep neural networks for time-variant linear transformations
    Kotani, Gaku
    Saito, Daisuke
    Minematsu, Nobuaki
    [J]. 2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1218 - 1221