Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques

被引:32
|
作者
Kolossa, D [1 ]
Klimas, A [1 ]
Orglmeister, R [1 ]
机构
[1] Tech Univ Berlin, D-10587 Berlin, Germany
关键词
D O I
10.1109/ASPAA.2005.1540174
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Time-Frequency masking has emerged as a powerful technique for source separation of noisy and convolved speech mixtures. It has also been applied successfully for noisy speech recognition, see e.g. [1. 2]. But while significant SNR gains are possible by adequate masking functions, speech recognition performance suffers from the involved nonlinear operations so that the greatly improved SNR often contrasts with only slight improvements in the recognition rate. To address this problem, marginalization techniques have been used for speech recognition [3, 4], but they rely on speech recognition and source separation to be carried out in the same domain. However, source separation and de-noising are often carried out in the Short-Time-Fourier-Transform (STFT) domain, whereas the most useful speech recognition features are e.g. mel-frequency cepstral coefficients (MFCCs), LPC-Cepstral Coefficients and VQ-Features. In these cases, marginalization techniques are not directly applicable. Here, another approach is suggested, which estimates sufficient statistics for speech features in the preprocessing (e.g. STFT)domain, propagates these statistics through the transforms from the spectrum to e.g, the MFCC's of a speech recognition system and uses the estimated statistics for missing data speech recognition. With this approach, significant gains call he achieved in speech recognition rates, and in this context, time-frequency masking yields recognition rate improvements of more than 35% when compared to TF-masking based source separation.
引用
下载
收藏
页码:82 / 85
页数:4
相关论文
共 50 条
  • [21] Maximizing environmental sound recognition and speech intelligibility using time-frequency masking
    Johnson, Eric M.
    Healy, Eric W.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (03):
  • [22] Bin-Wise Combination of Time-Frequency Masking and Beamforming for Convolutive Source Separation
    Bella, Mostafa
    Saylani, Hicham
    Hosseini, Shahram
    Deville, Yannick
    2022 IEEE 24TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2022,
  • [23] Robust digit recognition using phase-dependent time-frequency masking
    Shi, GJ
    Aarabi, P
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 684 - 687
  • [24] Robust digit recognition using phase-dependent time-frequency masking
    Shi, GJ
    Aarabi, P
    2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL III, PROCEEDINGS, 2003, : 629 - 632
  • [25] Cepstral representation of speech motivated by time-frequency masking: An application to speech recognition
    Aikawa, K
    Singer, H
    Kawahara, H
    Tohkura, Y
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1996, 100 (01): : 603 - 614
  • [26] Robust Recognition of Noisy Speech Through Partial Imputation of Missing Data
    Kian Ebrahim Kafoori
    Seyed Mohammad Ahadi
    Circuits, Systems, and Signal Processing, 2018, 37 : 1625 - 1648
  • [27] Robust Recognition of Noisy Speech Through Partial Imputation of Missing Data
    Kafoori, Kian Ebrahim
    Ahadi, Seyed Mohammad
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2018, 37 (04) : 1625 - 1648
  • [28] TIME-FREQUENCY CONVOLUTIONAL NETWORKS FOR ROBUST SPEECH RECOGNITION
    Mitra, Vikramjit
    Franco, Horacio
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 317 - 323
  • [29] Blind source separation using time-frequency masking
    Mohammed, Abbas
    Ballal, Tarig
    Grbic, Nedelko
    RADIOENGINEERING, 2007, 16 (04) : 96 - 100
  • [30] Blind speech source separation via nonlinear time-frequency masking
    Xu, Shun
    Chen, Shaorong
    Liu, Yulin
    Shengxue Xuebao/Acta Acustica, 2007, 32 (04): : 375 - 381