Time-Reversal Enhancement Network With Cross-Domain Information for Noise-Robust Speech Recognition

被引:0
|
作者
Chao, Fu-An [1 ]
Hung, Jeih-Weih [3 ]
Sheu, Tommy [4 ]
Chen, Berlin [2 ]
机构
[1] Natl Taiwan Normal Univ, Taipei 11677, Taiwan
[2] Natl Taiwan Normal Univ, Comp Sci & Informat Engn Dept, Taipei 11677, Taiwan
[3] Natl Chi Nan Univ, Dept Elect Engn, Puli 54516, Taiwan
[4] Delta Elect Inc, Delta Management Syst DMS Dept, Taipei 11491, Taiwan
关键词
Feature extraction; Convolutional neural networks; Spectrogram; Noise measurement; Speech enhancement; Estimation; Time-domain analysis;
D O I
10.1109/MMUL.2021.3139302
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Due to the enormous progress in deep learning, speech enhancement (SE) techniques have shown promising efficacy and play a pivotal role prior to an automatic speech recognition (ASR) system to mitigate the noise effects. In this article, we put forward a novel cross-domain time-reversal enhancement network (CD-TENET). CD-TENET leverages the time-reversed version of a speech signal and two effective features that consider the phase information of a speech signal in the time domain and the frequency domain, respectively, to promote SE performance for noise-robust ASR. Extensive experiments demonstrate that CD-TENET can not only recover the original speech effectively but also improve both SE and ASR performance simultaneously. More surprisingly, the proposed CD-TENET method can offer a marked relative word error rate reduction on test utterances of scenarios contaminated with unseen noises when compared to a strong baseline with the multicondition training setting.
引用
收藏
页码:114 / 124
页数:11
相关论文
共 50 条
  • [41] A companding front end for noise-robust automatic speech recognition
    Guinness, J
    Raj, B
    Schmidt-Nielsen, B
    Turicchia, L
    Sarpeshkar, R
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 249 - 252
  • [42] Noise-Robust Algorithm of Speech Features Extraction for Automatic Speech Recognition System
    Yakhnev, A. N.
    Pisarev, A. S.
    PROCEEDINGS OF THE XIX IEEE INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND MEASUREMENTS (SCM 2016), 2016, : 206 - 208
  • [43] A speech emphasis method for noise-robust speech recognition by using repetitive phrase
    Hirai, Takanori
    Kuroiwa, Shingo
    Tsuge, Satoru
    Ren, Fuji
    Fattah, Mohamed Abdel
    2006 10TH INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY, VOLS 1 AND 2, PROCEEDINGS, 2006, : 1269 - +
  • [44] Neighborhood Adaptive Attention Based Cross-domain Fusion Network for Speech Enhancement
    Yue H.
    Duo W.
    Yang J.
    Hunan Daxue Xuebao/Journal of Hunan University Natural Sciences, 2023, 50 (12): : 59 - 68
  • [45] A noise-robust speech input interface for information kiosk terminals
    Ida, M
    Mori, H
    Nakamura, S
    Shikano, K
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART II-ELECTRONICS, 2004, 87 (12): : 51 - 61
  • [46] CROSS-DOMAIN SPEECH ENHANCEMENT WITH A NEURAL CASCADE ARCHITECTURE
    Wang, Heming
    Wang, DeLiang
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7862 - 7866
  • [47] Sparse coding of the modulation spectrum for noise-robust automatic speech recognition
    Sara Ahmadi
    Seyed Mohammad Ahadi
    Bert Cranen
    Lou Boves
    EURASIP Journal on Audio, Speech, and Music Processing, 2014
  • [48] Improved model parameter compensation methods for noise-robust speech recognition
    Chang, YH
    Chung, YJ
    Park, SU
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 561 - 564
  • [49] Speech Enhancement Based on Teacher-Student Deep Learning Using Improved Speech Presence Probability for Noise-Robust Speech Recognition
    Tu, Yan-Hui
    Du, Jun
    Lee, Chin-Hui
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (12) : 2080 - 2091
  • [50] GAUSSIAN POWER FLOW ORIENTATION COEFFICIENTS FOR NOISE-ROBUST SPEECH RECOGNITION
    Gerazov, Branislav
    Ivanovski, Zoran
    2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 1467 - 1471