Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques

被引:32
|
作者
Kolossa, D [1 ]
Klimas, A [1 ]
Orglmeister, R [1 ]
机构
[1] Tech Univ Berlin, D-10587 Berlin, Germany
关键词
D O I
10.1109/ASPAA.2005.1540174
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Time-Frequency masking has emerged as a powerful technique for source separation of noisy and convolved speech mixtures. It has also been applied successfully for noisy speech recognition, see e.g. [1. 2]. But while significant SNR gains are possible by adequate masking functions, speech recognition performance suffers from the involved nonlinear operations so that the greatly improved SNR often contrasts with only slight improvements in the recognition rate. To address this problem, marginalization techniques have been used for speech recognition [3, 4], but they rely on speech recognition and source separation to be carried out in the same domain. However, source separation and de-noising are often carried out in the Short-Time-Fourier-Transform (STFT) domain, whereas the most useful speech recognition features are e.g. mel-frequency cepstral coefficients (MFCCs), LPC-Cepstral Coefficients and VQ-Features. In these cases, marginalization techniques are not directly applicable. Here, another approach is suggested, which estimates sufficient statistics for speech features in the preprocessing (e.g. STFT)domain, propagates these statistics through the transforms from the spectrum to e.g, the MFCC's of a speech recognition system and uses the estimated statistics for missing data speech recognition. With this approach, significant gains call he achieved in speech recognition rates, and in this context, time-frequency masking yields recognition rate improvements of more than 35% when compared to TF-masking based source separation.
引用
下载
收藏
页码:82 / 85
页数:4
相关论文
共 50 条
  • [1] Robust speech separation using time-frequency masking
    Aarabi, P
    Shi, GJ
    Jahromi, O
    2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I, PROCEEDINGS, 2003, : 741 - 744
  • [2] Source Separation of Convolutive and Noisy Mixtures Using Audio-Visual Dictionary Learning and Probabilistic Time-Frequency Masking
    Liu, Qingju
    Wang, Wenwu
    Jackson, Philip J. B.
    Barnard, Mark
    Kittler, Josef
    Chambers, Jonathon
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2013, 61 (22) : 5520 - 5535
  • [3] Blind separation of speech mixtures via time-frequency masking
    Yilmaz, Ö
    Rickard, S
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2004, 52 (07) : 1830 - 1847
  • [4] Blind separation of underdetermined Convolutive speech mixtures by time-frequency masking with the reduction of musical noise of separated signals
    Zohrevandi, Mahbanou
    Setayeshi, Saeed
    Rabiee, Azam
    Reshadi, Midia
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (08) : 12601 - 12618
  • [5] Time-Frequency Masking For Large Scale Robust Speech Recognition
    Wang, Yuxuan
    Misra, Ananya
    Chine, Kean K.
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2469 - 2473
  • [6] Recognition of convolutive speech mixtures by missing feature techniques for ICA
    Kolossa, Dorothea
    Sawada, Hiroshi
    Astudillo, Ramon Fernandez
    Orglmeister, Reinhold
    Makino, Shoji
    2006 FORTIETH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1-5, 2006, : 1397 - +
  • [7] Blind speech separation of nonlinear convolutive mixtures for robust speech recognition
    Koutras, A.
    Dermatas, E.
    Kokkinakis, G.
    Control and Intelligent Systems, 2002, 30 (02) : 83 - 90
  • [8] Robust speech recognition using cepstral domain Missing Data Techniques and noisy masks
    Van Hamme, H
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 213 - 216
  • [9] Blind separation of underdetermined convolutive mixtures using their time-frequency representation
    Aissa-El-Bey, Abdeldjalil
    Abed-Meraim, Karim
    Grenier, Yves
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (05): : 1540 - 1550
  • [10] Label Driven Time-Frequency Masking for Robust Continuous Speech Recognition
    Soni, Meet
    Panda, Ashish
    INTERSPEECH 2019, 2019, : 426 - 430