VOICE IMPERSONATION USING GENERATIVE ADVERSARIAL NETWORKS

被引:0
|
作者
Gao, Yang [1 ]
Singh, Rita [1 ]
Raj, Bhiksha [1 ]
机构
[1] Carnegie Mellon Univ, Elect & Comp Engn Dept, Pittsburgh, PA 15213 USA
关键词
Voice impersonation; generative adversarial network; style transformation; style transfer; CONVERSION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Voice impersonation is not the same as voice transformation, although the latter is an essential element of it. In voice impersonation, the resultant voice must convincingly convey the impression of having been naturally produced by the target speaker, mimicking not only the pitch and other perceivable signal qualities, but also the style of the target speaker. In this paper, we propose a novel neural-network based speech quality-and style-mimicry framework for the synthesis of impersonated voices. The framework is built upon a fast and accurate generative adversarial network model. Given spectrographic representations of source and target speakers' voices, the model learns to mimic the target speaker's voice quality and style, regardless of the linguistic content of either's voice, generating a synthetic spectrogram from which the time-domain signal is reconstructed using the Griffin-Lim method. In effect, this model reframes the well-known problem of style-transfer for images as the problem of style-transfer for speech signals, while intrinsically addressing the problem of durational variability of speech sounds. Experiments demonstrate that the model can generate extremely convincing samples of impersonated speech. It is even able to impersonate voices across different genders effectively. Results are qualitatively evaluated using standard procedures for evaluating synthesized voices.
引用
收藏
页码:2506 / 2510
页数:5
相关论文
共 50 条
  • [1] Face Reconstruction from Voice using Generative Adversarial Networks
    Wen, Yandong
    Singh, Rita
    Raj, Bhiksha
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [2] Non-parallel Voice Conversion using Generative Adversarial Networks
    Hasunuma, Yuta
    Hirayama, Chiaki
    Kobayashi, Masayuki
    Nagao, Tomoharu
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 1635 - 1640
  • [3] SINGING VOICE SYNTHESIS BASED ON GENERATIVE ADVERSARIAL NETWORKS
    Hono, Yukiya
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6955 - 6959
  • [4] SINGAN: Singing Voice Conversion with Generative Adversarial Networks
    Sisman, Berrak
    Vijayan, Karthika
    Dong, Minghui
    Li, Haizhou
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 112 - 118
  • [5] Non-parallel Voice Conversion using Weighted Generative Adversarial Networks
    Paul, Dipjyoti
    Pantazis, Yannis
    Stylianou, Yannis
    [J]. INTERSPEECH 2019, 2019, : 659 - 663
  • [6] Crossfire Conditional Generative Adversarial Networks for Singing Voice Extraction
    Yuan, Weitao
    Wang, Shengbei
    Li, Xiangrui
    Unoki, Masashi
    Wang, Wenwu
    [J]. INTERSPEECH 2021, 2021, : 3041 - 3045
  • [7] NVCGAN: Leveraging Generative Adversarial Networks for Robust Voice Conversion
    Zhang, Guoyu
    Liu, Jingrui
    Bi, Wenhao
    Dongye, Guangcheng
    Zhang, Li
    Jing, Ming
    Yu, Jiguo
    [J]. ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT II, ICIC 2024, 2024, 14876 : 330 - 342
  • [8] Generative Adversarial Networks for Unpaired Voice Transformation on Impaired Speech
    Chen, Li-Wei
    Lee, Hung-Yi
    Tsao, Yu
    [J]. INTERSPEECH 2019, 2019, : 719 - 723
  • [9] Sequence-to-Sequence Voice Conversion with Similarity Metric Learned Using Generative Adversarial Networks
    Kaneko, Takuhiro
    Kameoka, Hirokazu
    Hiramatsu, Kaoru
    Kashino, Kunio
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1283 - 1287
  • [10] Voice Conversion from Unaligned Corpora using Variational Autoencoding Wasserstein Generative Adversarial Networks
    Hsu, Chin-Cheng
    Hwang, Hsin-Te
    Wu, Yi-Chiao
    Tsao, Yu
    Wang, Hsin-Min
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3364 - 3368