Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques

被引:32
|
作者
Kolossa, D [1 ]
Klimas, A [1 ]
Orglmeister, R [1 ]
机构
[1] Tech Univ Berlin, D-10587 Berlin, Germany
关键词
D O I
10.1109/ASPAA.2005.1540174
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Time-Frequency masking has emerged as a powerful technique for source separation of noisy and convolved speech mixtures. It has also been applied successfully for noisy speech recognition, see e.g. [1. 2]. But while significant SNR gains are possible by adequate masking functions, speech recognition performance suffers from the involved nonlinear operations so that the greatly improved SNR often contrasts with only slight improvements in the recognition rate. To address this problem, marginalization techniques have been used for speech recognition [3, 4], but they rely on speech recognition and source separation to be carried out in the same domain. However, source separation and de-noising are often carried out in the Short-Time-Fourier-Transform (STFT) domain, whereas the most useful speech recognition features are e.g. mel-frequency cepstral coefficients (MFCCs), LPC-Cepstral Coefficients and VQ-Features. In these cases, marginalization techniques are not directly applicable. Here, another approach is suggested, which estimates sufficient statistics for speech features in the preprocessing (e.g. STFT)domain, propagates these statistics through the transforms from the spectrum to e.g, the MFCC's of a speech recognition system and uses the estimated statistics for missing data speech recognition. With this approach, significant gains call he achieved in speech recognition rates, and in this context, time-frequency masking yields recognition rate improvements of more than 35% when compared to TF-masking based source separation.
引用
下载
收藏
页码:82 / 85
页数:4
相关论文
共 50 条
  • [41] Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multitalker Conditions
    Kolossa, Dorothea
    Astudillo, Ramon Fernandez
    Hoffmann, Eugen
    Orglmeister, Reinhold
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2010,
  • [42] A CRITERION FOR THE ENHANCEMENT OF TIME-FREQUENCY MASKS IN MISSING DATA RECOGNITION
    Pullella, Daniel
    Togneri, Roberto
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4185 - 4188
  • [43] Segmented Time-Frequency Masking Algorithm for Speech Separation Based on Deep Neural Networks
    Guo, Xinyu
    Ou, Shifeng
    Gao, Meng
    Gao, Ying
    2020 13TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2020), 2020, : 445 - 450
  • [44] SPATIAL AND COHERENCE CUES BASED TIME-FREQUENCY MASKING FOR BINAURAL REVERBERANT SPEECH SEPARATION
    Alinaghi, Atiyeh
    Wang, Wenwu
    Jackson, Philip J. B.
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 684 - 688
  • [45] Reverberant speech separation with probabilistic time-frequency masking for B-format recordings
    Chen, Xiaoyi
    Wang, Wenwu
    Wang, Yingmin
    Zhong, Xionghu
    Alinaghi, Atiyeh
    SPEECH COMMUNICATION, 2015, 68 : 41 - 54
  • [46] Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation
    Luo, Yi
    Mesgarani, Nima
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (08) : 1256 - 1266
  • [47] ACOUSTIC VECTOR SENSOR BASED REVERBERANT SPEECH SEPARATION WITH PROBABILISTIC TIME-FREQUENCY MASKING
    Zhong, Xionghu
    Chen, Xiaoyi
    Wang, Wenwu
    Alinaghi, Atiyeh
    Premkumar, A. B.
    2013 PROCEEDINGS OF THE 21ST EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2013,
  • [48] A Data Field method for speech enhancement incorporating Binary Time-Frequency Masking
    Huang, Jianjun
    Zhang, Yafei
    Zhang, Xiongwei
    Zhu, Tao
    PRZEGLAD ELEKTROTECHNICZNY, 2011, 87 (07): : 225 - 229
  • [49] Sound Source Separation by Using Matched Beamforming and Time-Frequency Masking
    Beh, Jounghoon
    Lee, Taekjin
    Han, David
    Ko, Hanseok
    IEEE/RSJ 2010 INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2010), 2010,
  • [50] Blind Speech Separation in Multiple Environments Using a Frequency Oriented PCA Method for Convolutive Mixtures
    Benabderrahmane, Y.
    Selouani, S. A.
    O'Shaughnessy, D.
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 564 - +