On the integration of time-frequency masking speech separation and recognition in underdetermined environments

被引:0
|
作者
Jafari, Ingrid [1 ]
Haque, Serajul [1 ]
Togneri, Roberto [1 ]
Nordholm, Sven [2 ]
机构
[1] Univ Western Australia, Crawley, WA 6009, Australia
[2] Curtin Univ, Perth, WA 6845, Australia
来源
2012 CONFERENCE RECORD OF THE FORTY SIXTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS (ASILOMAR) | 2012年
基金
澳大利亚研究理事会;
关键词
SPARSE SOURCE SEPARATION; BLIND;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The successful application of automatic speech recognition systems in the real world is conditional on its ability to handle realistic environments with unfavorable conditions such as reverberation and multiple sources of inteference. Previous research has identified time-frequency masking based approaches to blind source separation as a viable approach for multisource reverberant source separation. It is proposed the use of such separation techniques as a front-end to speech recognition will encourage greater recognition accuracy. Experimental evaluations confirmed the hypothesis with an improvement in recognition accuracy of over 20% at a reverberation time of RT60 = 300ms; this is indicative of the potential for future research in this field.
引用
收藏
页码:1613 / 1617
页数:5
相关论文
共 50 条
  • [21] Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multitalker Conditions
    Dorothea Kolossa
    Ramon Fernandez Astudillo
    Eugen Hoffmann
    Reinhold Orglmeister
    EURASIP Journal on Audio, Speech, and Music Processing, 2010
  • [22] Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multitalker Conditions
    Kolossa, Dorothea
    Astudillo, Ramon Fernandez
    Hoffmann, Eugen
    Orglmeister, Reinhold
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2010,
  • [23] Label-Driven Time-Frequency Masking for Robust Speech Command Recognition
    Soni, Meet
    Sheikh, Imran
    Kopparapu, Sunil Kumar
    TEXT, SPEECH, AND DIALOGUE (TSD 2019), 2019, 11697 : 341 - 351
  • [24] Maximizing environmental sound recognition and speech intelligibility using time-frequency masking
    Johnson, Eric M.
    Healy, Eric W.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (03):
  • [25] Underdetermined blind separation of overlapped speech mixtures in time-frequency domain with estimated number of sources
    Zhang, Haijian
    Hua, Guang
    Yu, Lei
    Cai, Yunlong
    Bi, Guoan
    SPEECH COMMUNICATION, 2017, 89 : 1 - 16
  • [26] Segmented Time-Frequency Masking Algorithm for Speech Separation Based on Deep Neural Networks
    Guo, Xinyu
    Ou, Shifeng
    Gao, Meng
    Gao, Ying
    2020 13TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2020), 2020, : 445 - 450
  • [27] SPATIAL AND COHERENCE CUES BASED TIME-FREQUENCY MASKING FOR BINAURAL REVERBERANT SPEECH SEPARATION
    Alinaghi, Atiyeh
    Wang, Wenwu
    Jackson, Philip J. B.
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 684 - 688
  • [28] Reverberant speech separation with probabilistic time-frequency masking for B-format recordings
    Chen, Xiaoyi
    Wang, Wenwu
    Wang, Yingmin
    Zhong, Xionghu
    Alinaghi, Atiyeh
    SPEECH COMMUNICATION, 2015, 68 : 41 - 54
  • [29] Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation
    Luo, Yi
    Mesgarani, Nima
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (08) : 1256 - 1266
  • [30] ACOUSTIC VECTOR SENSOR BASED REVERBERANT SPEECH SEPARATION WITH PROBABILISTIC TIME-FREQUENCY MASKING
    Zhong, Xionghu
    Chen, Xiaoyi
    Wang, Wenwu
    Alinaghi, Atiyeh
    Premkumar, A. B.
    2013 PROCEEDINGS OF THE 21ST EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2013,