A noise-robust voice conversion method with controllable background sounds

被引:0
|
作者
Chen, Lele [1 ]
Zhang, Xiongwei [1 ]
Li, Yihao [1 ]
Sun, Meng [1 ]
Chen, Weiwei [1 ]
机构
[1] Army Engn Univ PLA, Coll Command & Control Engn, Nanjing 210007, Peoples R China
基金
中国国家自然科学基金;
关键词
Noise-robust voice conversion; Dual-decoder structure; Bridge module; Cycle loss; Speech disentanglement; SPEECH ENHANCEMENT; FRAMEWORK;
D O I
10.1007/s40747-024-01375-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Background noises are usually treated as redundant or even harmful to voice conversion. Therefore, when converting noisy speech, a pretrained module of speech separation is usually deployed to estimate clean speech prior to the conversion. However, this can lead to speech distortion due to the mismatch between the separation module and the conversion one. In this paper, a noise-robust voice conversion model is proposed, where a user can choose to retain or to remove the background sounds freely. Firstly, a speech separation module with a dual-decoder structure is proposed, where two decoders decode the denoised speech and the background sounds, respectively. A bridge module is used to capture the interactions between the denoised speech and the background sounds in parallel layers through information exchanging. Subsequently, a voice conversion module with multiple encoders to convert the estimated clean speech from the speech separation model. Finally, the speech separation and voice conversion module are jointly trained using a loss function combining cycle loss and mutual information loss, aiming to improve the decoupling efficacy among speech contents, pitch, and speaker identity. Experimental results show that the proposed model obtains significant improvements in both subjective and objective evaluation metrics compared with the existing baselines. The speech naturalness and speaker similarity of the converted speech are 3.47 and 3.43, respectively.
引用
收藏
页码:3981 / 3994
页数:14
相关论文
共 50 条
  • [31] Noise-robust synchronized chaotic communications
    Carroll, TL
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-FUNDAMENTAL THEORY AND APPLICATIONS, 2001, 48 (12): : 1519 - 1522
  • [32] Method of Noise-Robust Estimation of Parameters of an Autoregressive Model in the Frequency Domain
    V. K. Zadiraka
    V. Yu. Semenov
    Ye. V. Semenova
    Cybernetics and Systems Analysis, 2021, 57 : 836 - 842
  • [33] Method of Noise-Robust Estimation of Parameters of an Autoregressive Model in the Frequency Domain
    Zadiraka, V. K.
    Semenov, V. Yu.
    Semenova, Ye. V.
    CYBERNETICS AND SYSTEMS ANALYSIS, 2021, 57 (05) : 836 - 842
  • [34] A noise-robust acoustic method for recognizing foraging activities of grazing cattle
    Martinez-Rau, Luciano S.
    Chelotti, Jose O.
    Ferrero, Mariano
    Galli, Julio R.
    Utsumi, Santiago A.
    Planisich, Alejandra M.
    Rufiner, H. Leonardo
    Giovanini, Leonardo L.
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2025, 229
  • [35] Toward noise-robust quantum advantage
    Fefferman, Bill
    NATURE PHYSICS, 2020, 16 (10) : 1007 - 1008
  • [36] Image Fusion Method Using Noise-Robust Contrast Discrimination Measure
    Akashi, Ryuichi
    Shibata, Takashi
    Toda, Masato
    Chono, Keiichi
    2019 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2019,
  • [37] ADA-VAD: UNPAIRED ADVERSARIAL DOMAIN ADAPTATION FOR NOISE-ROBUST VOICE ACTIVITY DETECTION
    Kim, Taesoo
    Chang, Jiho
    Ko, Jong Hwan
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7327 - 7331
  • [38] An improved noise-robust voice activity detector based on hidden semi-Markov models
    Liang, Yuan
    Liu, Xianglong
    Lou, Yihua
    Shan, Baosong
    PATTERN RECOGNITION LETTERS, 2011, 32 (07) : 1044 - 1053
  • [39] Noise-robust Sleep States Classification Model using Sound Feature Extraction and Conversion
    Ko, Sangkeun
    Min, Seongho
    Choi, Ye Shin
    Kim, Woo-Je
    Lee, Suan
    2024 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, IEEE BIGCOMP 2024, 2024, : 281 - 286
  • [40] A noise robust voice conversion algorithm based on joint dictionary optimization
    Zhang, Shilei
    Jian, Zhihua
    Sun, Minhong
    Zhong, Hua
    Liu, Erxiao
    Shengxue Xuebao/Acta Acustica, 2019, 44 (06): : 1074 - 1082