Time-Frequency Masking For Large Scale Robust Speech Recognition

被引:0
|
作者
Wang, Yuxuan [1 ]
Misra, Ananya [2 ]
Chine, Kean K. [2 ]
机构
[1] Ohio State Univ, Columbus, OH 43210 USA
[2] Google Inc, Mountain View, CA USA
关键词
Robust speech recognition; time-frequency masking; deep neural network; feature denoising;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Time-frequency mask estimation has shown considerable success recently. In this paper, we demonstrate its utility as a feature enhancement frontend for large vocabulary conversational speech recognition. Additionally, we investigate how masking compares with feature denoising, which directly reconstructs clean features from noisy ones. We train a mask estimator that predicts ideal ratio masks. Experimental results on Google voice search evaluation sets demonstrate that masking is superior to feature denoising, and a lightweight masking frontend produces significant improvements over a strong baseline. We also show that masking improves performance of a multi condition trained (MTR) acoustic model.
引用
收藏
页码:2469 / 2473
页数:5
相关论文
共 50 条
  • [21] Robust speech watermarking procedure in the time-frequency domain
    Stankovic, Srdjan
    Orovic, Irena
    Zaric, Nikola
    [J]. EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2008, 2008 (1)
  • [22] Robust Speech Watermarking Procedure in the Time-Frequency Domain
    Srdjan Stanković
    Irena Orović
    Nikola Žarić
    [J]. EURASIP Journal on Advances in Signal Processing, 2008
  • [23] Speech intelligibility in background noise with ideal binary time-frequency masking
    Wang, DeLiang
    Kjems, Ulrik
    Pedersen, Michael S.
    Boldt, Jesper B.
    Lunner, Thomas
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2009, 125 (04): : 2336 - 2347
  • [24] Perceptual effects of noise reduction by time-frequency masking of noisy speech
    Brons, Inge
    Houben, Rolph
    Dreschler, Wouter A.
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2012, 132 (04): : 2690 - 2699
  • [25] Review of Time-Frequency Masking Approach for Improving Speech Intelligibility in Noise
    Kim, Gibak
    [J]. IETE TECHNICAL REVIEW, 2022, 39 (03) : 623 - 634
  • [26] Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation
    Brungart, Douglas S.
    Chang, Peter S.
    Simpson, Brian D.
    Wang, DeLiang
    [J]. Journal of the Acoustical Society of America, 2006, 120 (06): : 4007 - 4018
  • [27] Blind speech source separation via nonlinear time-frequency masking
    Xu, Shun
    Chen, Shaorong
    Liu, Yulin
    [J]. Shengxue Xuebao/Acta Acustica, 2007, 32 (04): : 375 - 381
  • [28] Blind speech source separation via nonlinear time-frequency masking
    XU Shun CHEN Shaorong LIU Yulin (DSP Lab.
    [J]. Chinese Journal of Acoustics, 2008, (03) : 203 - 214
  • [29] Speech recognition with localized time-frequency pattern detectors
    Schutte, Ken
    Glass, James
    [J]. 2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 341 - 346
  • [30] Stereo-input Speech Recognition using Sparseness-based Time-frequency Masking in a Reverberant Environment
    Izumi, Yosuke
    Nishiki, Kenta
    Watanabe, Shinji
    Nishimoto, Takuya
    Ono, Nobutaka
    Sagayama, Shigeki
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1907 - +