DIRECT NOISY SPEECH MODELING FOR NOISY-TO-NOISY VOICE CONVERSION

被引:2
|
作者
Xie, Chao [1 ]
Wu, Yi-Chiao [1 ]
Tobing, Patrick Lumban [1 ]
Huang, Wen-Chin [1 ]
Toda, Tomoki [1 ]
机构
[1] Nagoya Univ, Nagoya, Aichi, Japan
关键词
Voice conversion (VC); noisy-to-noisy VC; noisy speech modeling;
D O I
10.1109/ICASSP43922.2022.9747894
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Beyond the conventional voice conversion (VC) where the speaker information is converted without altering the linguistic content, the background sounds are informative and need to be retained in some real-world scenarios, such as VC in movie/video and VC in music where the voice is entangled with background sounds. As a new VC framework, we have developed a noisy-to-noisy (N2N) VC framework to convert the speaker's identity while preserving the background sounds. Although our framework consisting of a denoising module and a VC module well handles the background sounds, the VC module is sensitive to the distortion caused by the denoising module. To address this distortion issue, in this paper we propose the improved VC module to directly model the noisy speech waveform while controlling the background sounds. The experimental results have demonstrated that our improved framework significantly outperforms the previous one and achieves an acceptable score in terms of naturalness, while reaching comparable similarity performance to the upper bound of our framework.
引用
收藏
页码:6787 / 6791
页数:5
相关论文
共 50 条
  • [1] Noisy-to-Noisy Voice Conversion Under Variations of Noisy Condition
    Xie, Chao
    Toda, Tomoki
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3871 - 3882
  • [2] Noisy-to-Noisy Voice Conversion Framework with Denoising Model
    Xie, Chao
    Wu, Yi-Chiao
    Tobing, Patrick Lumban
    Huang, Wen-Chin
    Toda, Tomoki
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 814 - 820
  • [3] Speech Enhancement-assisted Voice Conversion in Noisy Environments
    Chan, Yun-Ju
    Peng, Chiang-Jen
    Wang, Syu-Siang
    Wang, Hsin-Min
    Tsao, Yu
    Chi, Tai-Shih
    [J]. PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1533 - 1538
  • [4] Perceptual speech modeling for noisy speech recognition
    Wu, CH
    Chiu, YH
    Lim, H
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 385 - 388
  • [5] Modeling a Noisy-channel for Voice Conversion Using Articulatory Features
    Bollepalli, Bajibabu
    Black, Alan W.
    Prahallad, Kishore
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2199 - 2202
  • [6] Pre-processing of noisy speech for voice coders
    Agarwal, T
    Kabal, P
    [J]. 2002 IEEE SPEECH CODING WORKSHOP PROCEEDINGS: A PARADIGM SHIFT TOWARD NEW CODING FUNCTIONS FOR THE BROADBAND AGE, 2002, : 169 - 171
  • [7] Robust Voice Activity Detection Algorithm for Noisy Speech
    Verteletskaya, Ekaterina
    Simak, Boris
    [J]. RTT 2009: 11TH INTERNATIONAL CONFERENCE RTT 2009 RESEARCH IN TELECOMMUNICATION TECHNOLOGY, CONFERENCE PROCEEDINGS, 2009, : 98 - 101
  • [8] Statistical Voice Conversion Based on Noisy Channel Model
    Saito, Daisuke
    Watanabe, Shinji
    Nakamura, Atsushi
    Minematsu, Nobuaki
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (06): : 1784 - 1794
  • [9] EXEMPLAR-BASED VOICE CONVERSION IN NOISY ENVIRONMENT
    Takashima, Ryoichi
    Takiguchi, Tetsuya
    Ariki, Yasuo
    [J]. 2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 313 - 317
  • [10] Speech Intelligibility Enhancement in Noisy Environments via Voice Conversion with Glimpse Proportion Measure
    Takeuchi, Taiho
    Tatekura, Yosuke
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1713 - 1717