Time-Frequency Masking For Large Scale Robust Speech Recognition

被引：0

作者：

Wang, Yuxuan ^{[1
]}

Misra, Ananya ^{[2
]}

Chine, Kean K. ^{[2
]}

机构：

[1] Ohio State Univ, Columbus, OH 43210 USA

[2] Google Inc, Mountain View, CA USA

来源：

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年

关键词：

Robust speech recognition; time-frequency masking; deep neural network; feature denoising;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Time-frequency mask estimation has shown considerable success recently. In this paper, we demonstrate its utility as a feature enhancement frontend for large vocabulary conversational speech recognition. Additionally, we investigate how masking compares with feature denoising, which directly reconstructs clean features from noisy ones. We train a mask estimator that predicts ideal ratio masks. Experimental results on Google voice search evaluation sets demonstrate that masking is superior to feature denoising, and a lightweight masking frontend produces significant improvements over a strong baseline. We also show that masking improves performance of a multi condition trained (MTR) acoustic model.

引用

页码：2469 / 2473

页数：5

共 50 条

[1] Label Driven Time-Frequency Masking for Robust Continuous Speech Recognition
Soni, Meet
Panda, Ashish
[J]. INTERSPEECH 2019, 2019, : 426 - 430
[2] Label-Driven Time-Frequency Masking for Robust Speech Command Recognition
Soni, Meet
Sheikh, Imran
Kopparapu, Sunil Kumar
[J]. TEXT, SPEECH, AND DIALOGUE (TSD 2019), 2019, 11697 : 341 - 351
[3] Robust speech separation using time-frequency masking
Aarabi, P
Shi, GJ
Jahromi, O
[J]. 2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I, PROCEEDINGS, 2003, : 741 - 744
[4] Robust Automatic Speech Recognition System Based on Using Adaptive Time-Frequency Masking
Gouda, Ahmed Mostafa
Tamazin, Mohamed
Khedr, Mohamed
[J]. PROCEEDINGS OF 2016 11TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2016, : 181 - 186
[5] Cepstral representation of speech motivated by time-frequency masking: An application to speech recognition
Aikawa, K
Singer, H
Kawahara, H
Tohkura, Y
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1996, 100 (01): : 603 - 614
[6] TIME-FREQUENCY CONVOLUTIONAL NETWORKS FOR ROBUST SPEECH RECOGNITION
Mitra, Vikramjit
Franco, Horacio
[J]. 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 317 - 323
[7] On time-frequency masking in voiced speech
Skoglund, J
Kleijn, WB
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (04): : 361 - 369
[8] On the integration of time-frequency masking speech separation and recognition in underdetermined environments
Jafari, Ingrid
Haque, Serajul
Togneri, Roberto
Nordholm, Sven
[J]. 2012 CONFERENCE RECORD OF THE FORTY SIXTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS (ASILOMAR), 2012, : 1613 - 1617
[9] Binary and ratio time-frequency masks for robust speech recognition
Srinivasan, Soundararajan
Roman, Nicoleta
Wang, DeLiang
[J]. SPEECH COMMUNICATION, 2006, 48 (11) : 1486 - 1501
[10] Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques
Kolossa, D
Klimas, A
Orglmeister, R
[J]. 2005 WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2005, : 82 - 85

← 1 2 3 4 5 →