Recognition-Synthesis Based Non-Parallel Voice Conversion with Adversarial Learning

被引:5
|
作者
Zhang, Jing-Xuan [1 ]
Ling, Zhen-Hua [1 ]
Dai, Li-Rong [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Peoples R China
来源
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
voice conversion; recognition-synthesis; adversarial learning; ATTENTION;
D O I
10.21437/Interspeech.2020-36
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
This paper presents an adversarial learning method for recognition-synthesis based non-parallel voice conversion. A recognizer is used to transform acoustic features into linguistic representations while a synthesizer recovers output features from the recognizer outputs together with the speaker identity. By separating the speaker characteristics from the linguistic representations, voice conversion can be achieved by replacing the speaker identity with the target one. In our proposed method, a speaker adversarial loss is adopted in order to obtain speaker-independent linguistic representations using the recognizer. Furthermore, discriminators are introduced and a generative adversarial network (GAN) loss is used to prevent the predicted features from being over-smoothed. For training model parameters, a strategy of pre-training on a multi-speaker dataset and then fine-tuning on the source-target speaker pair is designed. Our method achieved higher similarity than the baseline model that obtained the best performance in Voice Conversion Challenge 2018.
引用
收藏
页码:771 / 775
页数:5
相关论文
共 50 条
  • [21] Non-Parallel Voice Conversion for ASR Augmentation
    Wang, Gary
    Rosenberg, Andrew
    Ramabhadran, Bhuvana
    Biadsy, Fadi
    Huang, Yinghui
    Emond, Jesse
    Mengibar, Pedro Moreno
    [J]. INTERSPEECH 2022, 2022, : 3408 - 3412
  • [22] NON-PARALLEL TRAINING FOR VOICE CONVERSION BASED ON FT-GMM
    Chen, Ling-Hui
    Ling, Zhen-Hua
    Dai, Li-Rong
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5116 - 5119
  • [23] IMPROVING RECOGNITION-SYNTHESIS BASED ANY-TO-ONE VOICE CONVERSION WITH CYCLIC TRAINING
    Chen, Yan-Nian
    Liu, Li-Juan
    Hu, Ya-Jun
    Jiang, Yuan
    Ling, Zhen-Hua
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7007 - 7011
  • [24] MASKCYCLEGAN-VC: LEARNING NON-PARALLEL VOICE CONVERSION WITH FILLING IN FRAMES
    Kaneko, Takuhiro
    Kameoka, Hirokazu
    Tanaka, Kou
    Hojo, Nobukatsu
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5919 - 5923
  • [25] Non-Parallel Voice Conversion with Cyclic Variational Autoencoder
    Tobing, Patrick Lumban
    Wu, Yi-Chiao
    Hayashi, Tomoki
    Kobayashi, Kazuhiro
    Toda, Tomoki
    [J]. INTERSPEECH 2019, 2019, : 674 - 678
  • [26] Non-parallel Voice Conversion with Generative Attentional Networks
    Chiu, Tse Wei
    Guo, You Sheng
    Chang, Pao-Chi
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 141 - 145
  • [27] Frame Labeling and Mapping for Non-parallel Voice Conversion
    Dong, Minghui
    Yang, Chenyu
    Ehnes, Jochen Walter
    Lu, Yanfeng
    Ming, Huaiping
    Huang, Dongyan
    [J]. 2017 IEEE 2ND INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP), 2017, : 361 - 365
  • [28] Transferring Source Style in Non-Parallel Voice Conversion
    Liu, Songxiang
    Cao, Yuewen
    Kang, Shiyin
    Hu, Na
    Liu, Xunying
    Su, Dan
    Yu, Dong
    Meng, Helen
    [J]. INTERSPEECH 2020, 2020, : 4721 - 4725
  • [29] Parallel vs. Non-parallel Voice Conversion for Esophageal Speech
    Serrano, Luis
    Raman, Sneha
    Tavarez, David
    Navas, Eva
    Hernaez, Inma
    [J]. INTERSPEECH 2019, 2019, : 4549 - 4553
  • [30] Reimagining speech: a scoping review of deep learning-based methods for non-parallel voice conversion
    Bargum, Anders R.
    Serafin, Stefania
    Erkut, Cumhur
    [J]. FRONTIERS IN SIGNAL PROCESSING, 2024, 4