WHAM!: Extending Speech Separation to Noisy Environments

被引:153
|
作者
Wichern, Gordon [1 ]
Antognini, Joe [2 ]
Flynn, Michael [2 ]
Zhu, Licheng Richard [2 ]
McQuinn, Emmett [2 ]
Crow, Dwight [2 ]
Manilow, Ethan [1 ]
Le Roux, Jonathan [1 ]
机构
[1] Mitsubishi Elect Res Labs MERL, Cambridge, MA 02139 USA
[2] Whisper Ai, San Francisco, CA USA
来源
关键词
source separation; speech enhancement; cocktail party problem; deep clustering; mask inference;
D O I
10.21437/Interspeech.2019-2821
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Recent progress in separating the speech signals from multiple overlapping speakers using a single audio channel has brought us closer to solving the cocktail party problem. However, most studies in this area use a constrained problem setup, comparing performance when speakers overlap almost completely, at artificially low sampling rates, and with no external background noise. In this paper, we strive to move the field towards more realistic and challenging scenarios. To that end, we created the WSJ0 Hipster Ambient Mixtures (WHAM!) dataset, consisting of two speaker mixtures from the wsj0-2mix dataset combined with real ambient noise samples. The samples were collected in coffee shops, restaurants, and bars in the San Francisco Bay Area, and are made publicly available. We benchmark various speech separation architectures and objective functions to evaluate their robustness to noise. While separation performance decreases as a result of noise, we still observe substantial gains relative to the noisy signals for most approaches.
引用
收藏
页码:1368 / 1372
页数:5
相关论文
共 50 条
  • [11] Robust Speech Detection for Noisy Environments
    Varela, Oscar
    Indra, S. A.
    San-Segundo, Ruben
    Hernandez, Luis A.
    IEEE AEROSPACE AND ELECTRONIC SYSTEMS MAGAZINE, 2011, 26 (11) : 16 - U12
  • [12] Speech enhancement applied to speech recognition in noisy environments
    Xu, Y.F., 2001, Press of Tsinghua University (41):
  • [13] Improvement of Speech Intelligibility in Noisy Environments
    Yoon, Jae-Yul
    Kim, JungHoe
    Oh, Eunmi
    Park, Hochong
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2009, 28 (01): : 70 - 76
  • [14] Target Speech Detection and Separation for Humanoid Robots in Sparse Dialogue with Noisy Home Environments
    Kim, Hyun-Don
    Kim, Jinsung
    Komatani, Kazunori
    Ogata, Tetsuya
    Okuno, Hiroshi G.
    2008 IEEE/RSJ INTERNATIONAL CONFERENCE ON ROBOTS AND INTELLIGENT SYSTEMS, VOLS 1-3, CONFERENCE PROCEEDINGS, 2008, : 1705 - +
  • [15] Self-Attention for Multi-Channel Speech Separation in Noisy and Reverberant Environments
    Liu, Conggui
    Sato, Yoshinao
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 794 - 799
  • [16] Speech Emotion Recognition in Noisy and Reverberant Environments
    Heracleous, Panikos
    Yasuda, Keiji
    Sugaya, Fumiaki
    Yoneyama, Akio
    Hashimoto, Masayuki
    2017 SEVENTH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2017, : 262 - 266
  • [17] Multisensory benefits for speech recognition in noisy environments
    Oh, Yonghee
    Schwalm, Meg
    Kalpin, Nicole
    FRONTIERS IN NEUROSCIENCE, 2022, 16
  • [18] A robust speech enhancement method in noisy environments
    Abajaddi, Nesrine
    Mounir, Badia
    Elfahm, Youssef
    Farchi, Abdelmajid
    INTERNATIONAL JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING SYSTEMS, 2023, 14 (09) : 973 - 983
  • [19] Speech Recognition On Mobile Devices In Noisy Environments
    Yurtcan, Yaser
    Kilic, Banu Gunel
    2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [20] ACTIVE LISTENING: SPEECH INTELLIGIBILITY IN NOISY ENVIRONMENTS
    Carlile, Simon
    ACOUSTICS AUSTRALIA, 2014, 42 (02) : 90 - 96