Recognition-Synthesis Based Non-Parallel Voice Conversion with Adversarial Learning

被引:5
|
作者
Zhang, Jing-Xuan [1 ]
Ling, Zhen-Hua [1 ]
Dai, Li-Rong [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Peoples R China
来源
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
voice conversion; recognition-synthesis; adversarial learning; ATTENTION;
D O I
10.21437/Interspeech.2020-36
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
This paper presents an adversarial learning method for recognition-synthesis based non-parallel voice conversion. A recognizer is used to transform acoustic features into linguistic representations while a synthesizer recovers output features from the recognizer outputs together with the speaker identity. By separating the speaker characteristics from the linguistic representations, voice conversion can be achieved by replacing the speaker identity with the target one. In our proposed method, a speaker adversarial loss is adopted in order to obtain speaker-independent linguistic representations using the recognizer. Furthermore, discriminators are introduced and a generative adversarial network (GAN) loss is used to prevent the predicted features from being over-smoothed. For training model parameters, a strategy of pre-training on a multi-speaker dataset and then fine-tuning on the source-target speaker pair is designed. Our method achieved higher similarity than the baseline model that obtained the best performance in Voice Conversion Challenge 2018.
引用
收藏
页码:771 / 775
页数:5
相关论文
共 50 条
  • [1] Enriching Source Style Transfer in Recognition-Synthesis based Non-Parallel Voice Conversion
    Wang, Zhichao
    Zhou, Xinyong
    Yang, Fengyu
    Li, Tao
    Du, Hongqiang
    Xie, Lei
    Gan, Wendong
    Chen, Haitao
    Li, Hai
    [J]. INTERSPEECH 2021, 2021, : 831 - 835
  • [2] Effects of Sinusoidal Model on Non-Parallel Voice Conversion with Adversarial Learning
    Al-Radhi, Mohammed Salah
    Csapo, Tamas Gabor
    Nemeth, Geza
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (16):
  • [3] Non-parallel Many-to-many Singing Voice Conversion by Adversarial Learning
    Hu, Jinsen
    Yu, Chunyan
    Guan, Faqian
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 125 - 132
  • [4] StyleVC: Non-Parallel Voice Conversion with Adversarial Style Generalization
    Hwang, In-Sun
    Lee, Sang-Hoon
    Lee, Seong-Whan
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 23 - 30
  • [5] Non-parallel Voice Conversion using Generative Adversarial Networks
    Hasunuma, Yuta
    Hirayama, Chiaki
    Kobayashi, Masayuki
    Nagao, Tomoharu
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 1635 - 1640
  • [6] Non-parallel Voice Conversion Based on Perceptual Star Generative Adversarial Network
    Li, Yanping
    Qiu, Xiangtian
    Cao, Pan
    Zhang, Yan
    Bao, Bingkun
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (08) : 4632 - 4648
  • [7] Non-parallel Voice Conversion Based on Perceptual Star Generative Adversarial Network
    Yanping Li
    Xiangtian Qiu
    Pan Cao
    Yan Zhang
    Bingkun Bao
    [J]. Circuits, Systems, and Signal Processing, 2022, 41 : 4632 - 4648
  • [8] CVC: Contrastive Learning for Non-parallel Voice Conversion
    Li, Tingle
    Liu, Yichen
    Hu, Chenxu
    Zhao, Hang
    [J]. INTERSPEECH 2021, 2021, : 1324 - 1328
  • [9] NOVEL METRIC LEARNING FOR NON-PARALLEL VOICE CONVERSION
    Shah, Nirmesh J.
    Patil, Hemant A.
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3722 - 3726
  • [10] Non-parallel Voice Conversion using Weighted Generative Adversarial Networks
    Paul, Dipjyoti
    Pantazis, Yannis
    Stylianou, Yannis
    [J]. INTERSPEECH 2019, 2019, : 659 - 663