WHAM!: Extending Speech Separation to Noisy Environments

被引:153
|
作者
Wichern, Gordon [1 ]
Antognini, Joe [2 ]
Flynn, Michael [2 ]
Zhu, Licheng Richard [2 ]
McQuinn, Emmett [2 ]
Crow, Dwight [2 ]
Manilow, Ethan [1 ]
Le Roux, Jonathan [1 ]
机构
[1] Mitsubishi Elect Res Labs MERL, Cambridge, MA 02139 USA
[2] Whisper Ai, San Francisco, CA USA
来源
关键词
source separation; speech enhancement; cocktail party problem; deep clustering; mask inference;
D O I
10.21437/Interspeech.2019-2821
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Recent progress in separating the speech signals from multiple overlapping speakers using a single audio channel has brought us closer to solving the cocktail party problem. However, most studies in this area use a constrained problem setup, comparing performance when speakers overlap almost completely, at artificially low sampling rates, and with no external background noise. In this paper, we strive to move the field towards more realistic and challenging scenarios. To that end, we created the WSJ0 Hipster Ambient Mixtures (WHAM!) dataset, consisting of two speaker mixtures from the wsj0-2mix dataset combined with real ambient noise samples. The samples were collected in coffee shops, restaurants, and bars in the San Francisco Bay Area, and are made publicly available. We benchmark various speech separation architectures and objective functions to evaluate their robustness to noise. While separation performance decreases as a result of noise, we still observe substantial gains relative to the noisy signals for most approaches.
引用
收藏
页码:1368 / 1372
页数:5
相关论文
共 50 条
  • [21] Enhancement of Reverberant Speech in Noisy Acoustical Environments
    Joorabchi, Marjan
    Ghorshi, Seyed
    Sarafnia, Ali
    2014 SIXTH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING (WCSP), 2014,
  • [22] Deep encoder/decoder dual-path neural network for speech separation in noisy reverberation environments
    Wang, Chunxi
    Jia, Maoshen
    Zhang, Xinfeng
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
  • [23] Speech enhancement strategy for speech recognition microcontroller under noisy environments
    Chan, Kit Yan
    Nordholm, Sven
    Yiu, Ka Fai Cedric
    Togneri, Roberto
    NEUROCOMPUTING, 2013, 118 : 279 - 288
  • [24] Deep encoder/decoder dual-path neural network for speech separation in noisy reverberation environments
    Chunxi Wang
    Maoshen Jia
    Xinfeng Zhang
    EURASIP Journal on Audio, Speech, and Music Processing, 2023
  • [25] Adaptive Threshold for Speech Enhancement in Nonstationary Noisy Environments
    Lee, Soo-Jeong
    Kim, Sun-Hyob
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2008, 27 (07): : 386 - 393
  • [26] Chinese speech intelligibility of children in noisy and reverberant environments
    Peng, Jianxin
    Wu, Shengju
    INDOOR AND BUILT ENVIRONMENT, 2018, 27 (10) : 1357 - 1363
  • [27] Speech recognition in noisy environments with Convolutional Neural Networks
    Santos, Rafael M.
    Matos, Leonardo N.
    Macedo, Hendrik T.
    Montalvao, Jugurta
    2015 BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2015), 2015, : 175 - 179
  • [28] Comparative Analysis of Speech Dereverberation in Noisy Acoustical Environments
    Joorabchi, M.
    Ghorshi, S.
    Sarafnia, A.
    2015 IEEE 28TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2015, : 1248 - 1253
  • [29] Reducing Speech Noise for Patients with Dysarthria in Noisy Environments
    Seong, Woo Kyeong
    Park, Ji Hun
    Kim, Hong Kook
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (11): : 2881 - 2887
  • [30] Perceptual features for automatic speech recognition in noisy environments
    Haque, Serajul
    Togneri, Roberto
    Zaknich, Anthony
    SPEECH COMMUNICATION, 2009, 51 (01) : 58 - 75