Time-Reversal Enhancement Network With Cross-Domain Information for Noise-Robust Speech Recognition

被引:0
|
作者
Chao, Fu-An [1 ]
Hung, Jeih-Weih [3 ]
Sheu, Tommy [4 ]
Chen, Berlin [2 ]
机构
[1] Natl Taiwan Normal Univ, Taipei 11677, Taiwan
[2] Natl Taiwan Normal Univ, Comp Sci & Informat Engn Dept, Taipei 11677, Taiwan
[3] Natl Chi Nan Univ, Dept Elect Engn, Puli 54516, Taiwan
[4] Delta Elect Inc, Delta Management Syst DMS Dept, Taipei 11491, Taiwan
关键词
Feature extraction; Convolutional neural networks; Spectrogram; Noise measurement; Speech enhancement; Estimation; Time-domain analysis;
D O I
10.1109/MMUL.2021.3139302
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Due to the enormous progress in deep learning, speech enhancement (SE) techniques have shown promising efficacy and play a pivotal role prior to an automatic speech recognition (ASR) system to mitigate the noise effects. In this article, we put forward a novel cross-domain time-reversal enhancement network (CD-TENET). CD-TENET leverages the time-reversed version of a speech signal and two effective features that consider the phase information of a speech signal in the time domain and the frequency domain, respectively, to promote SE performance for noise-robust ASR. Extensive experiments demonstrate that CD-TENET can not only recover the original speech effectively but also improve both SE and ASR performance simultaneously. More surprisingly, the proposed CD-TENET method can offer a marked relative word error rate reduction on test utterances of scenarios contaminated with unseen noises when compared to a strong baseline with the multicondition training setting.
引用
收藏
页码:114 / 124
页数:11
相关论文
共 50 条
  • [21] Noise-robust speech recognition in mobile network based on convolution neural networks
    Bouchakour, Lallouani
    Debyeche, Mohamed
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2022, 25 (01) : 269 - 277
  • [22] Noise-robust speech recognition in mobile network based on convolution neural networks
    Lallouani Bouchakour
    Mohamed Debyeche
    International Journal of Speech Technology, 2022, 25 : 269 - 277
  • [23] An engineering model of the masking for the noise-robust speech recognition
    Park, KY
    Lee, SY
    NEUROCOMPUTING, 2003, 52-4 : 615 - 620
  • [24] FLEXIBLE MULTICHANNEL SPEECH ENHANCEMENT FOR NOISE-ROBUST FRONTEND
    Jukic, Ante
    Balam, Jagadeesh
    Ginsburg, Boris
    2023 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA, 2023,
  • [25] A Joint Speech Enhancement and Self-Supervised Representation Learning Framework for Noise-Robust Speech Recognition
    Zhu, Qiu-Shi
    Zhang, Jie
    Zhang, Zi-Qiang
    Dai, Li-Rong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1927 - 1939
  • [26] Linear Prediction Filtering on Cepstral Time Series for Noise-Robust Speech Recognition
    Hsieh, Hsin-Ju
    Jheng, Jhih-Hao
    Lin, Jung-shan
    Hung, Jeih-weih
    2016 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-TAIWAN (ICCE-TW), 2016, : 311 - 312
  • [27] Speech Enhancement for Noise-Robust Speech Synthesis using Wasserstein GAN
    Adiga, Nagaraj
    Pantazis, Yannis
    Tsiaras, Vassilis
    Stylianou, Yannis
    INTERSPEECH 2019, 2019, : 1821 - 1825
  • [28] Factorial Speech Processing Models for Noise-Robust Automatic Speech Recognition
    Khademian, Mahdi
    Homayounpour, Mohammad Mehdi
    2015 23RD IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2015, : 637 - 642
  • [29] Agricultural price information acquisition using noise-robust Mandarin auto speech recognition
    Xu J.
    Zhu Y.
    Xu P.
    Ma D.
    International Journal of Speech Technology, 2018, 21 (3) : 681 - 688
  • [30] Investigating Cross-Domain Losses for Speech Enhancement
    Abdulatif, Sherif
    Armanious, Karim
    Sajeev, Jayasankar T.
    Guirguis, Karim
    Yang, Bin
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 411 - 415