An efficient joint training model for monaural noisy-reverberant speech recognition

被引:0
|
作者
Lian, Xiaoyu [1 ]
Xia, Nan [1 ]
Dai, Gaole [1 ]
Yang, Hongqin [1 ]
机构
[1] School of Information Science and Engineering, Dalian Polytechnic University, Liaoning, Dalian,116034, China
关键词
Background noise;
D O I
10.1016/j.apacoust.2024.110322
中图分类号
学科分类号
摘要
Noise and reverberation can seriously reduce speech quality and intelligibility, affecting the performance of downstream speech recognition tasks. This paper constructs a joint training speech recognition network for speech recognition in monaural noisy-reverberant environments. In the speech enhancement model, a complex-valued channel and temporal-frequency attention (CCTFA) are integrated to focus on the key features of the complex spectrum. Then the CCTFA network (CCTFANet) is constructed to reduce the influence of noise and reverberation. In the speech recognition model, an element-wise linear attention (EWLA) module is proposed to linearize the attention complexity and reduce the number of parameters and computations required for the attention module. Then the EWLA Conformer (EWLAC) is constructed as an efficient end-to-end speech recognition model. On the open source dataset, joint training of CCTFANet with EWLAC reduces the CER by 3.27%. Compared to other speech recognition models, EWLAC maintains CER while achieving much lower parameter count, computational overhead, and higher inference speed. © 2024 Elsevier Ltd
引用
收藏
相关论文
共 50 条
  • [1] Convolutive Prediction for Monaural Speech Dereverberation and Noisy-Reverberant Speaker Separation
    Wang, Zhong-Qiu
    Wichern, Gordon
    Le Roux, Jonathan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3476 - 3490
  • [2] Dual branch deep interactive UNet for monaural noisy-reverberant speech enhancement
    Zhang, Zehua
    Xu, Shiyun
    Zhuang, Xuyi
    Qian, Yukun
    Wang, Mingjiang
    APPLIED ACOUSTICS, 2023, 212
  • [3] Two-Stage Deep Learning for Noisy-Reverberant Speech Enhancement
    Zhao, Yan
    Wang, Zhong-Qiu
    Wang, DeLiang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (01) : 53 - 62
  • [4] Noisy-reverberant Speech Enhancement Using DenseUNet with Time-frequency Attention
    Zhao, Yan
    Wang, DeLiang
    INTERSPEECH 2020, 2020, : 3261 - 3265
  • [5] Multi-branch Learning for Noisy and Reverberant Monaural Speech Separation
    Ma, Chao
    Li, Dongmei
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1247 - 1251
  • [6] Speech Emotion Recognition in Noisy and Reverberant Environments
    Heracleous, Panikos
    Yasuda, Keiji
    Sugaya, Fumiaki
    Yoneyama, Akio
    Hashimoto, Masayuki
    2017 SEVENTH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2017, : 262 - 266
  • [7] ROBUST SPEECH RECOGNITION IN UNKNOWN REVERBERANT AND NOISY CONDITIONS
    Hsiao, Roger
    Ma, Jeff
    Hartmann, William
    Karafiat, Martin
    Grezl, Frantisek
    Burget, Lukas
    Szoke, Igor
    Cernocky, Jan Honza
    Watanabe, Shinji
    Chen, Zhuo
    Mallidi, Sri Harish
    Hermansky, Hynek
    Tsakalidis, Stavros
    Schwartz, Richard
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 533 - 538
  • [8] Techniques for robust speech recognition in noisy and reverberant conditions
    Brown, GJ
    Palomäki, KJ
    SPEECH SEPARATION BY HUMANS AND MACHINES, 2005, : 213 - 220
  • [9] SPEECH RECOGNITION IN A NOISY AND REVERBERANT ENVIRONMENT WITH AND WITHOUT EARMUFFS
    PEKKARINEN, E
    VILJANEN, V
    SALMIVALLI, A
    SUONPAA, J
    AUDIOLOGY, 1990, 29 (05): : 286 - 293
  • [10] Speech Enhancement and Recognition of Compressed Speech Signal in Noisy Reverberant Conditions
    Suman, Maloji
    Khan, Habibulla
    Latha, M. Madhavi
    Kumari, Devarakonda Aruna
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS 2012 (INDIA 2012), 2012, 132 : 379 - +