Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques

被引：32

作者：

Kolossa, D ^{[1
]}

Klimas, A ^{[1
]}

Orglmeister, R ^{[1
]}

机构：

[1] Tech Univ Berlin, D-10587 Berlin, Germany

来源：

2005 WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA) | 2005年

关键词：

D O I：

10.1109/ASPAA.2005.1540174

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Time-Frequency masking has emerged as a powerful technique for source separation of noisy and convolved speech mixtures. It has also been applied successfully for noisy speech recognition, see e.g. [1. 2]. But while significant SNR gains are possible by adequate masking functions, speech recognition performance suffers from the involved nonlinear operations so that the greatly improved SNR often contrasts with only slight improvements in the recognition rate. To address this problem, marginalization techniques have been used for speech recognition [3, 4], but they rely on speech recognition and source separation to be carried out in the same domain. However, source separation and de-noising are often carried out in the Short-Time-Fourier-Transform (STFT) domain, whereas the most useful speech recognition features are e.g. mel-frequency cepstral coefficients (MFCCs), LPC-Cepstral Coefficients and VQ-Features. In these cases, marginalization techniques are not directly applicable. Here, another approach is suggested, which estimates sufficient statistics for speech features in the preprocessing (e.g. STFT)domain, propagates these statistics through the transforms from the spectrum to e.g, the MFCC's of a speech recognition system and uses the estimated statistics for missing data speech recognition. With this approach, significant gains call he achieved in speech recognition rates, and in this context, time-frequency masking yields recognition rate improvements of more than 35% when compared to TF-masking based source separation.

引用

下载

页码：82 / 85

页数：4

共 50 条

[31] Blind speech source separation via nonlinear time-frequency masking
XU Shun CHEN Shaorong LIU Yulin (DSP Lab.
Chinese Journal of Acoustics, 2008, (03) : 203 - 214
[32] A TIME-FREQUENCY BLIND SEPARATION METHOD FOR UNDERDETERMINED SPEECH MIXTURES
Lv Yao Li Shuangtian(Institute of Acoustics
Journal of Electronics(China), 2008, (05) : 702 - 708
[33] Underdetermined blind separation of audio sources from the time-frequency representation of their convolutive mixtures
Aissa-El-Bey, Abdeldjalil
Abed-Meraim, Karim
Grenier, Yves
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PTS 1-3, PROCEEDINGS, 2007, : 153 - 156
[34] Online blind speech separation using multiple acoustic speaker tracking and time-frequency masking
Pertila, P.
COMPUTER SPEECH AND LANGUAGE, 2013, 27 (03): : 683 - 702
[35] Binary and ratio time-frequency masks for robust speech recognition
Srinivasan, Soundararajan
Roman, Nicoleta
Wang, DeLiang
SPEECH COMMUNICATION, 2006, 48 (11) : 1486 - 1501
[36] Separation of Cardiorespiratory Sounds Using Time-Frequency Masking and Sparsity
Shah, Ghafoor
Papadias, Constantinos
2013 18TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2013,
[37] Missing data techniques using voicing probability dor robust automatic speech recognition
Kim, LY
Cho, HY
Oh, YH
ELECTRONICS LETTERS, 2001, 37 (11) : 723 - 724
[38] Robust speech recognition using missing data techniques in the prospect domain and fuzzy masks
Van Segbroeck, Maarten
Van Hamme, Hugo
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4393 - 4396
[39] Robust sparse time-frequency analysis for data missing scenarios
Chen, Yingpin
Huang, Yuming
Song, Jianhua
IET SIGNAL PROCESSING, 2023, 17 (01)
[40] Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multitalker Conditions
Dorothea Kolossa
Ramon Fernandez Astudillo
Eugen Hoffmann
Reinhold Orglmeister
EURASIP Journal on Audio, Speech, and Music Processing, 2010

← 1 2 3 4 5 →