A noise-robust voice conversion method with controllable background sounds

被引:0
|
作者
Chen, Lele [1 ]
Zhang, Xiongwei [1 ]
Li, Yihao [1 ]
Sun, Meng [1 ]
Chen, Weiwei [1 ]
机构
[1] Army Engn Univ PLA, Coll Command & Control Engn, Nanjing 210007, Peoples R China
基金
中国国家自然科学基金;
关键词
Noise-robust voice conversion; Dual-decoder structure; Bridge module; Cycle loss; Speech disentanglement; SPEECH ENHANCEMENT; FRAMEWORK;
D O I
10.1007/s40747-024-01375-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Background noises are usually treated as redundant or even harmful to voice conversion. Therefore, when converting noisy speech, a pretrained module of speech separation is usually deployed to estimate clean speech prior to the conversion. However, this can lead to speech distortion due to the mismatch between the separation module and the conversion one. In this paper, a noise-robust voice conversion model is proposed, where a user can choose to retain or to remove the background sounds freely. Firstly, a speech separation module with a dual-decoder structure is proposed, where two decoders decode the denoised speech and the background sounds, respectively. A bridge module is used to capture the interactions between the denoised speech and the background sounds in parallel layers through information exchanging. Subsequently, a voice conversion module with multiple encoders to convert the estimated clean speech from the speech separation model. Finally, the speech separation and voice conversion module are jointly trained using a loss function combining cycle loss and mutual information loss, aiming to improve the decoupling efficacy among speech contents, pitch, and speaker identity. Experimental results show that the proposed model obtains significant improvements in both subjective and objective evaluation metrics compared with the existing baselines. The speech naturalness and speaker similarity of the converted speech are 3.47 and 3.43, respectively.
引用
收藏
页码:3981 / 3994
页数:14
相关论文
共 50 条
  • [1] Noise-robust voice conversion with domain adversarial training
    Du, Hongqiang
    Xie, Lei
    Li, Haizhou
    [J]. NEURAL NETWORKS, 2022, 148 : 74 - 84
  • [2] Noise-robust voice conversion based on joint dictionary optimization
    ZHANG Shilei
    JIAN Zhihua
    SUN Minhong
    ZHONG Hua
    LIU Erxiao
    [J]. Chinese Journal of Acoustics, 2020, 39 (02) : 259 - 272
  • [3] AUDIO-VISUAL VOICE CONVERSION USING NOISE-ROBUST FEATURES
    Sawada, Kohei
    Takehara, Masanori
    Tamura, Satoshi
    Hayamizu, Satoru
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [4] Noise-robust voice conversion using adversarial training with multi-feature decoupling
    Chen, Lele
    Zhang, Xiongwei
    Li, Yihao
    Sun, Meng
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 131
  • [5] On training targets for noise-robust voice activity detection
    Braun, Sebastian
    Tashev, Ivan
    [J]. 29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 421 - 425
  • [6] A Hearing Device With an Adaptive Noise Canceller for Noise-Robust Voice Input
    Miyahara, Ryoji
    Oosugi, Kouji
    Sugiyama, Akihiko
    [J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2019, 65 (04) : 444 - 453
  • [7] Noise-Robust Method for Image Segmentation
    Despotovic, Ivana
    Jelaca, Vadran
    Vansteenkiste, Ewout
    Philips, Wilfried
    [J]. ADVANCED CONCEPTS FOR INTELLIGENT VISION SYSTEMS, PT I, 2010, 6474 : 153 - 162
  • [8] Averaged boosting: A noise-robust ensemble method
    Kim, Y
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, 2003, 2637 : 388 - 393
  • [9] Noise-Robust Voice Conversion Using High-Quefrency Boosting via Sub-Band Cepstrum Conversion and Fusion
    Miao, Xiaokong
    Sun, Meng
    Zhang, Xiongwei
    Wang, Yimin
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (01):
  • [10] NOISE-ROBUST VOICE CONVERSION USING A SMALL PARALLE DATA BASED ON NON-NEGATIVE MATRIX FACTORIZATION
    Aihara, Ryo
    Fujii, Takao
    Nakashika, Toru
    Takiguchi, Tetsuya
    Ariki, Yasuo
    [J]. 2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 315 - 319