Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques

被引:32
|
作者
Kolossa, D [1 ]
Klimas, A [1 ]
Orglmeister, R [1 ]
机构
[1] Tech Univ Berlin, D-10587 Berlin, Germany
关键词
D O I
10.1109/ASPAA.2005.1540174
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Time-Frequency masking has emerged as a powerful technique for source separation of noisy and convolved speech mixtures. It has also been applied successfully for noisy speech recognition, see e.g. [1. 2]. But while significant SNR gains are possible by adequate masking functions, speech recognition performance suffers from the involved nonlinear operations so that the greatly improved SNR often contrasts with only slight improvements in the recognition rate. To address this problem, marginalization techniques have been used for speech recognition [3, 4], but they rely on speech recognition and source separation to be carried out in the same domain. However, source separation and de-noising are often carried out in the Short-Time-Fourier-Transform (STFT) domain, whereas the most useful speech recognition features are e.g. mel-frequency cepstral coefficients (MFCCs), LPC-Cepstral Coefficients and VQ-Features. In these cases, marginalization techniques are not directly applicable. Here, another approach is suggested, which estimates sufficient statistics for speech features in the preprocessing (e.g. STFT)domain, propagates these statistics through the transforms from the spectrum to e.g, the MFCC's of a speech recognition system and uses the estimated statistics for missing data speech recognition. With this approach, significant gains call he achieved in speech recognition rates, and in this context, time-frequency masking yields recognition rate improvements of more than 35% when compared to TF-masking based source separation.
引用
下载
收藏
页码:82 / 85
页数:4
相关论文
共 50 条
  • [31] Blind speech source separation via nonlinear time-frequency masking
    XU Shun CHEN Shaorong LIU Yulin (DSP Lab.
    Chinese Journal of Acoustics, 2008, (03) : 203 - 214
  • [32] A TIME-FREQUENCY BLIND SEPARATION METHOD FOR UNDERDETERMINED SPEECH MIXTURES
    Lv Yao Li Shuangtian(Institute of Acoustics
    Journal of Electronics(China), 2008, (05) : 702 - 708
  • [33] Underdetermined blind separation of audio sources from the time-frequency representation of their convolutive mixtures
    Aissa-El-Bey, Abdeldjalil
    Abed-Meraim, Karim
    Grenier, Yves
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PTS 1-3, PROCEEDINGS, 2007, : 153 - 156
  • [34] Online blind speech separation using multiple acoustic speaker tracking and time-frequency masking
    Pertila, P.
    COMPUTER SPEECH AND LANGUAGE, 2013, 27 (03): : 683 - 702
  • [35] Binary and ratio time-frequency masks for robust speech recognition
    Srinivasan, Soundararajan
    Roman, Nicoleta
    Wang, DeLiang
    SPEECH COMMUNICATION, 2006, 48 (11) : 1486 - 1501
  • [36] Separation of Cardiorespiratory Sounds Using Time-Frequency Masking and Sparsity
    Shah, Ghafoor
    Papadias, Constantinos
    2013 18TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2013,
  • [37] Missing data techniques using voicing probability dor robust automatic speech recognition
    Kim, LY
    Cho, HY
    Oh, YH
    ELECTRONICS LETTERS, 2001, 37 (11) : 723 - 724
  • [38] Robust speech recognition using missing data techniques in the prospect domain and fuzzy masks
    Van Segbroeck, Maarten
    Van Hamme, Hugo
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4393 - 4396
  • [39] Robust sparse time-frequency analysis for data missing scenarios
    Chen, Yingpin
    Huang, Yuming
    Song, Jianhua
    IET SIGNAL PROCESSING, 2023, 17 (01)
  • [40] Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multitalker Conditions
    Dorothea Kolossa
    Ramon Fernandez Astudillo
    Eugen Hoffmann
    Reinhold Orglmeister
    EURASIP Journal on Audio, Speech, and Music Processing, 2010