An efficient joint training model for monaural noisy-reverberant speech recognition

被引:0
|
作者
Lian, Xiaoyu [1 ]
Xia, Nan [1 ]
Dai, Gaole [1 ]
Yang, Hongqin [1 ]
机构
[1] School of Information Science and Engineering, Dalian Polytechnic University, Liaoning, Dalian,116034, China
关键词
Background noise;
D O I
10.1016/j.apacoust.2024.110322
中图分类号
学科分类号
摘要
Noise and reverberation can seriously reduce speech quality and intelligibility, affecting the performance of downstream speech recognition tasks. This paper constructs a joint training speech recognition network for speech recognition in monaural noisy-reverberant environments. In the speech enhancement model, a complex-valued channel and temporal-frequency attention (CCTFA) are integrated to focus on the key features of the complex spectrum. Then the CCTFA network (CCTFANet) is constructed to reduce the influence of noise and reverberation. In the speech recognition model, an element-wise linear attention (EWLA) module is proposed to linearize the attention complexity and reduce the number of parameters and computations required for the attention module. Then the EWLA Conformer (EWLAC) is constructed as an efficient end-to-end speech recognition model. On the open source dataset, joint training of CCTFANet with EWLAC reduces the CER by 3.27%. Compared to other speech recognition models, EWLAC maintains CER while achieving much lower parameter count, computational overhead, and higher inference speed. © 2024 Elsevier Ltd
引用
收藏
相关论文
共 50 条
  • [21] Investigation of Monaural Front-End Processing for Robust Speech Recognition Without Retraining or Joint-Training
    Du, Zhihao
    Zhang, Xueliang
    Han, Jiqing
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 249 - 254
  • [22] SPATIAL DIFFUSENESS FEATURES FOR DNN-BASED SPEECH RECOGNITION IN NOISY AND REVERBERANT ENVIRONMENTS
    Schwarz, Andreas
    Huemmer, Christian
    Maas, Roland
    Kellermann, Walter
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4380 - 4384
  • [23] AMPLITUDE MODULATION SPECTROGRAM BASED FEATURES FOR ROBUST SPEECH RECOGNITION IN NOISY AND REVERBERANT ENVIRONMENTS
    Moritz, Niko
    Anemueller, Joern
    Kollmeier, Birger
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5492 - 5495
  • [24] OPTIMIZING SPECTRAL SUBTRACTION AND WIENER FILTERING FOR ROBUST SPEECH RECOGNITION IN REVERBERANT AND NOISY CONDITIONS
    Gomez, Randy
    Kawahara, Tatsuya
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4566 - 4569
  • [25] Experiments of speech recognition in a noisy and reverberant environment using a microphone array and HMM adaptation
    Giuliani, D
    Omologo, M
    Svaizer, P
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1329 - 1332
  • [26] Speech Recognition by Denoising and Dereverberation Based on Spectral Subtraction in a Real Noisy Reverberant Environment
    Odani, Kyohei
    Wang, Longbiao
    Kai, Atsuhiko
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1250 - 1253
  • [27] A discriminative and robust training algorithm for noisy speech recognition
    Hong, WT
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 8 - 11
  • [28] Noisy training for deep neural networks in speech recognition
    Shi Yin
    Chao Liu
    Zhiyong Zhang
    Yiye Lin
    Dong Wang
    Javier Tejedor
    Thomas Fang Zheng
    Yinguo Li
    EURASIP Journal on Audio, Speech, and Music Processing, 2015
  • [29] Improved Noisy Student Training for Automatic Speech Recognition
    Park, Daniel S.
    Zhang, Yu
    Jia, Ye
    Han, Wei
    Chiu, Chung-Cheng
    Li, Bo
    Wu, Yonghui
    Le, Quoc, V
    INTERSPEECH 2020, 2020, : 2817 - 2821
  • [30] Noisy training for deep neural networks in speech recognition
    Yin, Shi
    Liu, Chao
    Zhang, Zhiyong
    Lin, Yiye
    Wang, Dong
    Tejedor, Javier
    Zheng, Thomas Fang
    Li, Yinguo
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015, : 1 - 14