WHAM!: Extending Speech Separation to Noisy Environments

被引:153
|
作者
Wichern, Gordon [1 ]
Antognini, Joe [2 ]
Flynn, Michael [2 ]
Zhu, Licheng Richard [2 ]
McQuinn, Emmett [2 ]
Crow, Dwight [2 ]
Manilow, Ethan [1 ]
Le Roux, Jonathan [1 ]
机构
[1] Mitsubishi Elect Res Labs MERL, Cambridge, MA 02139 USA
[2] Whisper Ai, San Francisco, CA USA
来源
关键词
source separation; speech enhancement; cocktail party problem; deep clustering; mask inference;
D O I
10.21437/Interspeech.2019-2821
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Recent progress in separating the speech signals from multiple overlapping speakers using a single audio channel has brought us closer to solving the cocktail party problem. However, most studies in this area use a constrained problem setup, comparing performance when speakers overlap almost completely, at artificially low sampling rates, and with no external background noise. In this paper, we strive to move the field towards more realistic and challenging scenarios. To that end, we created the WSJ0 Hipster Ambient Mixtures (WHAM!) dataset, consisting of two speaker mixtures from the wsj0-2mix dataset combined with real ambient noise samples. The samples were collected in coffee shops, restaurants, and bars in the San Francisco Bay Area, and are made publicly available. We benchmark various speech separation architectures and objective functions to evaluate their robustness to noise. While separation performance decreases as a result of noise, we still observe substantial gains relative to the noisy signals for most approaches.
引用
收藏
页码:1368 / 1372
页数:5
相关论文
共 50 条
  • [41] A robust endpoint detection of speech for noisy environments with application to automatic speech recognition
    Bou-Ghazale, SE
    Assaleh, K
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 3808 - 3811
  • [42] Blind Source Separation of Noisy Mixed Speech Signals
    Li, Huiya
    Shi, Jianying
    Men, Jinxi
    SENSORS, MEASUREMENT AND INTELLIGENT MATERIALS II, PTS 1 AND 2, 2014, 475-476 : 291 - +
  • [43] Improvement in automatic speech recognition performance in noisy environments using time-domain blind source separation
    Demir, Cemil
    Harmanci, F. Kerem
    2007 IEEE 15TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS, VOLS 1-3, 2007, : 503 - 506
  • [44] Effects of urgent speech and preceding sounds on speech intelligibility in noisy and reverberant environments
    Hodoshima, Nao
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1696 - 1699
  • [45] A Performance Comparison of Commercial Speech Recognition APIs in Noisy Environments
    Lee G.
    Lee S.
    Ji S.
    Kim A.
    Im H.
    Transactions of the Korean Institute of Electrical Engineers, 2022, 71 (09): : 1266 - 1273
  • [46] Automatic speech recognition for Moroccan dialect in noisy traffic environments
    Ezzine, Abderrahim
    Laaidi, Naouar
    Satori, Hassan
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 141
  • [47] Broadband beamforming with adaptive postfiltering for speech acquisition in noisy environments
    Fischer, S
    Kammeyer, KD
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 359 - 362
  • [48] A novel algorithm to robust speech endpoint detection in noisy environments
    Yi, Li
    Yingle, Fan
    ICIEA 2007: 2ND IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS, VOLS 1-4, PROCEEDINGS, 2007, : 1555 - 1558
  • [49] A robust speech recognition system for communication robots in noisy environments
    Ishi, Carlos Toshinori
    Matsuda, Shigeki
    Kanda, Takayuki
    Jitsuhiro, Takatoshi
    Ishiguro, Hiroshi
    Nakamura, Satoshi
    Hagita, Norihiro
    IEEE TRANSACTIONS ON ROBOTICS, 2008, 24 (03) : 759 - 763
  • [50] A comparative study for Arabic speech recognition system in noisy environments
    Ouisaadane, Abdelkbir
    Safi, Said
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (03) : 761 - 770