WHAM!: Extending Speech Separation to Noisy Environments

被引:153
|
作者
Wichern, Gordon [1 ]
Antognini, Joe [2 ]
Flynn, Michael [2 ]
Zhu, Licheng Richard [2 ]
McQuinn, Emmett [2 ]
Crow, Dwight [2 ]
Manilow, Ethan [1 ]
Le Roux, Jonathan [1 ]
机构
[1] Mitsubishi Elect Res Labs MERL, Cambridge, MA 02139 USA
[2] Whisper Ai, San Francisco, CA USA
来源
关键词
source separation; speech enhancement; cocktail party problem; deep clustering; mask inference;
D O I
10.21437/Interspeech.2019-2821
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Recent progress in separating the speech signals from multiple overlapping speakers using a single audio channel has brought us closer to solving the cocktail party problem. However, most studies in this area use a constrained problem setup, comparing performance when speakers overlap almost completely, at artificially low sampling rates, and with no external background noise. In this paper, we strive to move the field towards more realistic and challenging scenarios. To that end, we created the WSJ0 Hipster Ambient Mixtures (WHAM!) dataset, consisting of two speaker mixtures from the wsj0-2mix dataset combined with real ambient noise samples. The samples were collected in coffee shops, restaurants, and bars in the San Francisco Bay Area, and are made publicly available. We benchmark various speech separation architectures and objective functions to evaluate their robustness to noise. While separation performance decreases as a result of noise, we still observe substantial gains relative to the noisy signals for most approaches.
引用
收藏
页码:1368 / 1372
页数:5
相关论文
共 50 条
  • [1] Modulation domain blind speech separation in noisy environments
    Zhang, Yi
    Zhao, Yunxin
    SPEECH COMMUNICATION, 2013, 55 (10) : 1081 - 1099
  • [2] FEATURE DENOISING FOR SPEECH SEPARATION IN UNKNOWN NOISY ENVIRONMENTS
    Wang, Yuxuan
    Wang, DeLiang
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7472 - 7476
  • [3] SESNet: A Speech Enhancement and Separation Network in Noisy Reverberant Environments
    Wang, Liusong
    Gao, Yuan
    Cao, Kaimin
    Hu, Ying
    MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2024, 2025, 2312 : 44 - 54
  • [4] GLMSNET: SINGLE CHANNEL SPEECH SEPARATION FRAMEWORK IN NOISY AND REVERBERANT ENVIRONMENTS
    Shi, Huiyu
    Chen, Xi
    Kong, Tianlong
    Yin, Shouyi
    Ouyang, Peng
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 663 - 670
  • [5] Target speech detection and separation for communication with humanoid robots in noisy home environments
    Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto 606-8501, Japan
    不详
    Adv Rob, 15 (2093-2111):
  • [6] Target Speech Detection and Separation for Communication with Humanoid Robots in Noisy Home Environments
    Kim, Hyun-Don
    Kim, Jinsung
    Komatani, Kazunori
    Ogata, Tetsuya
    Okuno, Hiroshi G.
    ADVANCED ROBOTICS, 2009, 23 (15) : 2093 - 2111
  • [7] Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model
    Martel, Hector
    Richter, Julius
    Li, Kai
    Hu, Xiaolin
    Gerkmann, Timo
    INTERSPEECH 2023, 2023, : 1673 - 1677
  • [8] SPEECH COMMUNICATION IN VERY NOISY ENVIRONMENTS
    CHERRY, C
    WILEY, R
    NATURE, 1967, 214 (5093) : 1164 - &
  • [9] SPEECH RECOGNITION IN NOISY ENVIRONMENTS - A SURVEY
    GONG, YF
    SPEECH COMMUNICATION, 1995, 16 (03) : 261 - 291
  • [10] Speech Synthesis enhancement in noisy environments
    Bonardo, Davide
    Zovato, Enrico
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 789 - 792