A noise-robust voice conversion method with controllable background sounds

被引:0
|
作者
Chen, Lele [1 ]
Zhang, Xiongwei [1 ]
Li, Yihao [1 ]
Sun, Meng [1 ]
Chen, Weiwei [1 ]
机构
[1] Army Engn Univ PLA, Coll Command & Control Engn, Nanjing 210007, Peoples R China
基金
中国国家自然科学基金;
关键词
Noise-robust voice conversion; Dual-decoder structure; Bridge module; Cycle loss; Speech disentanglement; SPEECH ENHANCEMENT; FRAMEWORK;
D O I
10.1007/s40747-024-01375-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Background noises are usually treated as redundant or even harmful to voice conversion. Therefore, when converting noisy speech, a pretrained module of speech separation is usually deployed to estimate clean speech prior to the conversion. However, this can lead to speech distortion due to the mismatch between the separation module and the conversion one. In this paper, a noise-robust voice conversion model is proposed, where a user can choose to retain or to remove the background sounds freely. Firstly, a speech separation module with a dual-decoder structure is proposed, where two decoders decode the denoised speech and the background sounds, respectively. A bridge module is used to capture the interactions between the denoised speech and the background sounds in parallel layers through information exchanging. Subsequently, a voice conversion module with multiple encoders to convert the estimated clean speech from the speech separation model. Finally, the speech separation and voice conversion module are jointly trained using a loss function combining cycle loss and mutual information loss, aiming to improve the decoupling efficacy among speech contents, pitch, and speaker identity. Experimental results show that the proposed model obtains significant improvements in both subjective and objective evaluation metrics compared with the existing baselines. The speech naturalness and speaker similarity of the converted speech are 3.47 and 3.43, respectively.
引用
收藏
页码:3981 / 3994
页数:14
相关论文
共 50 条
  • [21] A novel noise-robust method for efficient online data decomposition
    Yang, Yiguo
    Li, Shuai
    Wu, Pin
    Feng, Weibing
    INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2025,
  • [22] A noise-robust voice activity detection algorithm using wavelets and support vector machines
    Chen, Shi-Huang
    Chen, Shih-Hao
    IMECS 2007: INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2007, : 447 - 450
  • [23] MosquitoSong plus : A noise-robust deep learning model for mosquito classification from wingbeat sounds
    Supratak, Akara
    Haddawy, Peter
    Yin, Myat Su
    Ziemer, Tim
    Siritanakorn, Worameth
    Assawavinijkulchai, Kanpitcha
    Chiamsakul, Kanrawee
    Chantanalertvilai, Tharit
    Suchalermkul, Wish
    Sa-ngamuang, Chaitawat
    Sriwichai, Patchara
    PLOS ONE, 2024, 19 (10):
  • [24] Noise-Robust Voice Activity Detector Based On Four States-Based HMM
    Zhou, Bin
    Liu, Jing
    Pei, Zheng
    INFORMATION TECHNOLOGY APPLICATIONS IN INDUSTRY II, PTS 1-4, 2013, 411-414 : 743 - 748
  • [25] Noise-Robust Speaker Recognition Combining Missing Data Techniques and Universal Background Modeling
    May, Tobias
    van de Par, Steven
    Kohlrausch, Armin
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01): : 108 - 121
  • [26] Robustness of Statistical Voice Conversion based on Direct Waveform Modification against Background Sounds
    Kurita, Yusuke
    Kobayashi, Kazuhiro
    Takeda, Kazuya
    Toda, Tomoki
    INTERSPEECH 2019, 2019, : 684 - 688
  • [27] A method to identify noise-robust perceptual features:: Application for consonant |t|
    Regnier, Marion S.
    Allen, Jont B.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2008, 123 (05): : 2801 - 2814
  • [28] Noise-robust watermarking for numerical datasets
    Sebé, F
    Domingo-Ferrer, J
    Solanas, A
    MODELING DECISIONS FOR ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, 3558 : 134 - 143
  • [29] Toward noise-robust quantum advantage
    Bill Fefferman
    Nature Physics, 2020, 16 : 1007 - 1008
  • [30] Similarity measurement method robust to background noise
    Fujitsu Lab, Ltd, Akashi, Japan
    Syst Comput Jpn, 2 (38-46):