ROBUST SPEECH RECOGNITION FROM RATIO MASKS

被引:0
|
作者
Wang, Zhong-Qiu [1 ]
Wang, DeLiang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
关键词
Robust ASR; Ideal Ratio Mask; Ideal Binary Mask; CNN; DNN; NOISE;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Robustness against noise is crucial for automatic speech recognition systems in real-world environments. In this paper, we propose a novel approach that performs robust ASR by directly recognizing ratio masks. In the proposed approach, a deep neural network (DNN) is first trained to estimate the ideal ratio mask (IRM) from a noisy utterance and then a convolutional neural network (CNN) is employed to recognize estimated IRMs. The proposed approach has been evaluated on the TIDigits corpus, and the results demonstrate that direct recognition of ratio masks outperforms direct recognition of binary masks and traditional MMSE-HMM based method for robust ASR.
引用
收藏
页码:5720 / 5724
页数:5
相关论文
共 50 条
  • [1] Robust speech recognition from binary masks
    Narayanan, Arun
    Wang, DeLiang
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2010, 128 (05): : EL217 - EL222
  • [2] Binary and ratio time-frequency masks for robust speech recognition
    Srinivasan, Soundararajan
    Roman, Nicoleta
    Wang, DeLiang
    [J]. SPEECH COMMUNICATION, 2006, 48 (11) : 1486 - 1501
  • [3] ROBUST ISOLATED SPEECH RECOGNITION USING BINARY MASKS
    Karadogan, Seliz Gulsen
    Larsen, Jan
    Pedersen, Michael Syskind
    Boldt, Jesper Bunsow
    [J]. 18TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2010), 2010, : 1988 - 1992
  • [4] SPARSE IMPUTATION FOR NOISE ROBUST SPEECH RECOGNITION USING SOFT MASKS
    Gemmeke, J. F.
    Cranen, B.
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4645 - 4648
  • [5] Reconstructing spectral vectors with uncertain spectrographic masks for robust speech recognition
    Raj, B
    Singh, R
    [J]. 2005 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2005, : 65 - 70
  • [6] Robust recognition of emotion from speech
    Hoque, Mohammed E.
    Yeasin, Mohammed
    Louwerse, Max M.
    [J]. INTELLIGENT VIRTUAL AGENTS, PROCEEDINGS, 2006, 4133 : 42 - 53
  • [7] Robust speech recognition using missing data techniques in the prospect domain and fuzzy masks
    Van Segbroeck, Maarten
    Van Hamme, Hugo
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4393 - 4396
  • [8] Robust speech recognition using cepstral domain Missing Data Techniques and noisy masks
    Van Hamme, H
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 213 - 216
  • [9] Recognizing articulatory gestures from speech for robust speech recognition
    Mitra, Vikramjit
    Nam, Hosung
    Espy-Wilson, Carol
    Saltzman, Elliot
    Goldstein, Louis
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2012, 131 (03): : 2270 - 2287
  • [10] Robust speech emotion recognition using log frequency power ratio
    Hyun, Kyung-Hak
    Kim, Eun-Ho
    Kwak, Yoon-Keun
    [J]. 2006 SICE-ICASE INTERNATIONAL JOINT CONFERENCE, VOLS 1-13, 2006, : 229 - +