Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multitalker Conditions

被引:25
|
作者
Kolossa, Dorothea [1 ]
Astudillo, Ramon Fernandez [1 ]
Hoffmann, Eugen [1 ]
Orglmeister, Reinhold [1 ]
机构
[1] Elect & Med Signal Proc Grp, D-10587 Berlin, Germany
关键词
SEPARATION; MIXTURES; DOMAIN;
D O I
10.1155/2010/651420
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
When a number of speakers are simultaneously active, for example in meetings or noisy public places, the sources of interest need to be separated from interfering speakers and from each other in order to be robustly recognized. Independent component analysis (ICA) has proven a valuable tool for this purpose. However, ICA outputs can still contain strong residual components of the interfering speakers whenever noise or reverberation is high. In such cases, nonlinear postprocessing can be applied to the ICA outputs, for the purpose of reducing remaining interferences. In order to improve robustness to the artefacts and loss of information caused by this process, recognition can be greatly enhanced by considering the processed speech feature vector as a random variable with time-varying uncertainty, rather than as deterministic. The aim of this paper is to show the potential to improve recognition of multiple overlapping speech signals through nonlinear postprocessing together with uncertainty-based decoding techniques.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multitalker Conditions
    Dorothea Kolossa
    Ramon Fernandez Astudillo
    Eugen Hoffmann
    Reinhold Orglmeister
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2010
  • [2] Underdetermined DOA Estimation via Independent Component Analysis and Time-Frequency Masking
    Jancovic, Peter
    Zou, Xin
    Kokuer, Munevver
    [J]. JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING, 2010, 2010
  • [3] Time-Frequency Masking For Large Scale Robust Speech Recognition
    Wang, Yuxuan
    Misra, Ananya
    Chine, Kean K.
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2469 - 2473
  • [4] Cepstral representation of speech motivated by time-frequency masking: An application to speech recognition
    Aikawa, K
    Singer, H
    Kawahara, H
    Tohkura, Y
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1996, 100 (01): : 603 - 614
  • [5] On time-frequency masking in voiced speech
    Skoglund, J
    Kleijn, WB
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (04): : 361 - 369
  • [6] Label Driven Time-Frequency Masking for Robust Continuous Speech Recognition
    Soni, Meet
    Panda, Ashish
    [J]. INTERSPEECH 2019, 2019, : 426 - 430
  • [7] On the integration of time-frequency masking speech separation and recognition in underdetermined environments
    Jafari, Ingrid
    Haque, Serajul
    Togneri, Roberto
    Nordholm, Sven
    [J]. 2012 CONFERENCE RECORD OF THE FORTY SIXTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS (ASILOMAR), 2012, : 1613 - 1617
  • [8] Label-Driven Time-Frequency Masking for Robust Speech Command Recognition
    Soni, Meet
    Sheikh, Imran
    Kopparapu, Sunil Kumar
    [J]. TEXT, SPEECH, AND DIALOGUE (TSD 2019), 2019, 11697 : 341 - 351
  • [9] Maximizing environmental sound recognition and speech intelligibility using time-frequency masking
    Johnson, Eric M.
    Healy, Eric W.
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (03):
  • [10] Robust Automatic Speech Recognition System Based on Using Adaptive Time-Frequency Masking
    Gouda, Ahmed Mostafa
    Tamazin, Mohamed
    Khedr, Mohamed
    [J]. PROCEEDINGS OF 2016 11TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2016, : 181 - 186