Time-Frequency Masking For Large Scale Robust Speech Recognition

被引:0
|
作者
Wang, Yuxuan [1 ]
Misra, Ananya [2 ]
Chine, Kean K. [2 ]
机构
[1] Ohio State Univ, Columbus, OH 43210 USA
[2] Google Inc, Mountain View, CA USA
关键词
Robust speech recognition; time-frequency masking; deep neural network; feature denoising;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Time-frequency mask estimation has shown considerable success recently. In this paper, we demonstrate its utility as a feature enhancement frontend for large vocabulary conversational speech recognition. Additionally, we investigate how masking compares with feature denoising, which directly reconstructs clean features from noisy ones. We train a mask estimator that predicts ideal ratio masks. Experimental results on Google voice search evaluation sets demonstrate that masking is superior to feature denoising, and a lightweight masking frontend produces significant improvements over a strong baseline. We also show that masking improves performance of a multi condition trained (MTR) acoustic model.
引用
收藏
页码:2469 / 2473
页数:5
相关论文
共 50 条
  • [1] Label Driven Time-Frequency Masking for Robust Continuous Speech Recognition
    Soni, Meet
    Panda, Ashish
    [J]. INTERSPEECH 2019, 2019, : 426 - 430
  • [2] Label-Driven Time-Frequency Masking for Robust Speech Command Recognition
    Soni, Meet
    Sheikh, Imran
    Kopparapu, Sunil Kumar
    [J]. TEXT, SPEECH, AND DIALOGUE (TSD 2019), 2019, 11697 : 341 - 351
  • [3] Robust speech separation using time-frequency masking
    Aarabi, P
    Shi, GJ
    Jahromi, O
    [J]. 2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I, PROCEEDINGS, 2003, : 741 - 744
  • [4] Robust Automatic Speech Recognition System Based on Using Adaptive Time-Frequency Masking
    Gouda, Ahmed Mostafa
    Tamazin, Mohamed
    Khedr, Mohamed
    [J]. PROCEEDINGS OF 2016 11TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2016, : 181 - 186
  • [5] Cepstral representation of speech motivated by time-frequency masking: An application to speech recognition
    Aikawa, K
    Singer, H
    Kawahara, H
    Tohkura, Y
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1996, 100 (01): : 603 - 614
  • [6] TIME-FREQUENCY CONVOLUTIONAL NETWORKS FOR ROBUST SPEECH RECOGNITION
    Mitra, Vikramjit
    Franco, Horacio
    [J]. 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 317 - 323
  • [7] On time-frequency masking in voiced speech
    Skoglund, J
    Kleijn, WB
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (04): : 361 - 369
  • [8] On the integration of time-frequency masking speech separation and recognition in underdetermined environments
    Jafari, Ingrid
    Haque, Serajul
    Togneri, Roberto
    Nordholm, Sven
    [J]. 2012 CONFERENCE RECORD OF THE FORTY SIXTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS (ASILOMAR), 2012, : 1613 - 1617
  • [9] Binary and ratio time-frequency masks for robust speech recognition
    Srinivasan, Soundararajan
    Roman, Nicoleta
    Wang, DeLiang
    [J]. SPEECH COMMUNICATION, 2006, 48 (11) : 1486 - 1501
  • [10] Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques
    Kolossa, D
    Klimas, A
    Orglmeister, R
    [J]. 2005 WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2005, : 82 - 85