A noise-robust voice conversion method with controllable background sounds

被引:0
|
作者
Chen, Lele [1 ]
Zhang, Xiongwei [1 ]
Li, Yihao [1 ]
Sun, Meng [1 ]
Chen, Weiwei [1 ]
机构
[1] Army Engn Univ PLA, Coll Command & Control Engn, Nanjing 210007, Peoples R China
基金
中国国家自然科学基金;
关键词
Noise-robust voice conversion; Dual-decoder structure; Bridge module; Cycle loss; Speech disentanglement; SPEECH ENHANCEMENT; FRAMEWORK;
D O I
10.1007/s40747-024-01375-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Background noises are usually treated as redundant or even harmful to voice conversion. Therefore, when converting noisy speech, a pretrained module of speech separation is usually deployed to estimate clean speech prior to the conversion. However, this can lead to speech distortion due to the mismatch between the separation module and the conversion one. In this paper, a noise-robust voice conversion model is proposed, where a user can choose to retain or to remove the background sounds freely. Firstly, a speech separation module with a dual-decoder structure is proposed, where two decoders decode the denoised speech and the background sounds, respectively. A bridge module is used to capture the interactions between the denoised speech and the background sounds in parallel layers through information exchanging. Subsequently, a voice conversion module with multiple encoders to convert the estimated clean speech from the speech separation model. Finally, the speech separation and voice conversion module are jointly trained using a loss function combining cycle loss and mutual information loss, aiming to improve the decoupling efficacy among speech contents, pitch, and speaker identity. Experimental results show that the proposed model obtains significant improvements in both subjective and objective evaluation metrics compared with the existing baselines. The speech naturalness and speaker similarity of the converted speech are 3.47 and 3.43, respectively.
引用
收藏
页码:3981 / 3994
页数:14
相关论文
共 50 条
  • [41] A Noise-Robust Measuring Algorithm for Small Tubes Based on an Iterative Statistical Method
    Kim, Hyoung Seok
    Naranbaatar, Erdenesuren
    Lee, Byung Ryong
    TRANSACTIONS OF THE KOREAN SOCIETY OF MECHANICAL ENGINEERS A, 2011, 35 (02) : 175 - 181
  • [42] An overview of noise-robust automatic speech recognition
    Li, Jinyu
    Deng, Li
    Gong, Yifan
    Haeb-Umbach, Reinhold
    IEEE Transactions on Audio, Speech and Language Processing, 2014, 22 (04): : 745 - 777
  • [43] Unsupervised spectral subtraction for noise-robust ASR
    Lathoud, G
    Magimai-Doss, M
    Mesot, B
    Bourlard, H
    2005 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2005, : 343 - 348
  • [44] Noise-Robust Iterative Back-Projection
    Yoo, Jun-Sang
    Kim, Jong-Ok
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 1219 - 1232
  • [45] PeriodNet: Noise-Robust Fault Diagnosis Method Under Varying Speed Conditions
    Li, Ruixian
    Wu, Jianguo
    Li, Yongxiang
    Cheng, Yao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 14045 - 14059
  • [46] A noise-robust semi-supervised dimensionality reduction method for face recognition
    Gan, Haitao
    OPTIK, 2018, 157 : 858 - 865
  • [47] A noise-robust vibration signal extraction method utilizing intensity optical flow
    Shan, Mingguang
    Xiong, Xuefen
    Wang, Jianfeng
    Dang, Mengmeng
    Zhou, Xueqian
    Liang, Luyi
    Zhong, Zhi
    Liu, Bin
    Liu, Lei
    Yu, Lei
    MEASUREMENT, 2024, 235
  • [48] Noise-robust oversampling for imbalanced data classification
    Liu, Yongxu
    Liu, Yan
    Yu, Bruce X. B.
    Zhong, Shenghua
    Hu, Zhejing
    PATTERN RECOGNITION, 2023, 133
  • [49] Estimation-Based Noise-Robust Sensing
    Polydoros, Andreas
    Dagres, Ioannis
    2012 7TH INTERNATIONAL ICST CONFERENCE ON COGNITIVE RADIO ORIENTED WIRELESS NETWORKS AND COMMUNICATIONS (CROWNCOM), 2012, : 362 - 366
  • [50] Noise-Robust Least-Squares Method in TDOA Estimation of a Source Location
    Al-Asadi, Ahmed Waleed
    Ali, Nabeel Salih
    2021 PALESTINIAN INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (PICICT 2021), 2021, : 123 - 128