ROBUST SPEECH RECOGNITION FROM RATIO MASKS

被引：0

作者：

Wang, Zhong-Qiu ^{[1
]}

Wang, DeLiang ^{[1
,2
]}

机构：

[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA

来源：

2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS | 2016年

关键词：

Robust ASR; Ideal Ratio Mask; Ideal Binary Mask; CNN; DNN; NOISE;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Robustness against noise is crucial for automatic speech recognition systems in real-world environments. In this paper, we propose a novel approach that performs robust ASR by directly recognizing ratio masks. In the proposed approach, a deep neural network (DNN) is first trained to estimate the ideal ratio mask (IRM) from a noisy utterance and then a convolutional neural network (CNN) is employed to recognize estimated IRMs. The proposed approach has been evaluated on the TIDigits corpus, and the results demonstrate that direct recognition of ratio masks outperforms direct recognition of binary masks and traditional MMSE-HMM based method for robust ASR.

引用

页码：5720 / 5724

页数：5

共 50 条

[1] Robust speech recognition from binary masks
Narayanan, Arun
Wang, DeLiang
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2010, 128 (05): : EL217 - EL222
[2] Binary and ratio time-frequency masks for robust speech recognition
Srinivasan, Soundararajan
Roman, Nicoleta
Wang, DeLiang
[J]. SPEECH COMMUNICATION, 2006, 48 (11) : 1486 - 1501
[3] ROBUST ISOLATED SPEECH RECOGNITION USING BINARY MASKS
Karadogan, Seliz Gulsen
Larsen, Jan
Pedersen, Michael Syskind
Boldt, Jesper Bunsow
[J]. 18TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2010), 2010, : 1988 - 1992
[4] SPARSE IMPUTATION FOR NOISE ROBUST SPEECH RECOGNITION USING SOFT MASKS
Gemmeke, J. F.
Cranen, B.
[J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4645 - 4648
[5] Reconstructing spectral vectors with uncertain spectrographic masks for robust speech recognition
Raj, B
Singh, R
[J]. 2005 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2005, : 65 - 70
[6] Robust recognition of emotion from speech
Hoque, Mohammed E.
Yeasin, Mohammed
Louwerse, Max M.
[J]. INTELLIGENT VIRTUAL AGENTS, PROCEEDINGS, 2006, 4133 : 42 - 53
[7] Robust speech recognition using missing data techniques in the prospect domain and fuzzy masks
Van Segbroeck, Maarten
Van Hamme, Hugo
[J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4393 - 4396
[8] Robust speech recognition using cepstral domain Missing Data Techniques and noisy masks
Van Hamme, H
[J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 213 - 216
[9] Recognizing articulatory gestures from speech for robust speech recognition
Mitra, Vikramjit
Nam, Hosung
Espy-Wilson, Carol
Saltzman, Elliot
Goldstein, Louis
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2012, 131 (03): : 2270 - 2287
[10] Robust speech emotion recognition using log frequency power ratio
Hyun, Kyung-Hak
Kim, Eun-Ho
Kwak, Yoon-Keun
[J]. 2006 SICE-ICASE INTERNATIONAL JOINT CONFERENCE, VOLS 1-13, 2006, : 229 - +

← 1 2 3 4 5 →