A companding front end for noise-robust automatic speech recognition

被引:0
|
作者
Guinness, J [1 ]
Raj, B [1 ]
Schmidt-Nielsen, B [1 ]
Turicchia, L [1 ]
Sarpeshkar, R [1 ]
机构
[1] Mitsubishi Elect Res Labs, Cambridge, MA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature computation modules for automatic speech recognition (ASR) systems have long been modeled on the human auditory system. Most current ASR systems model the critical band response and equal loudness characteristics of the auditory system. It has been postulated that more detailed models of the human auditory system can lead to more noise-robust speech recognition. An auditory phenomenon that is of particular relevance to robustness is simultaneous masking, whereby dominant frequencies suppress adjacent weaker frequencies. In this paper we present a companding-based model that mimics simultaneous masking in the front end of a speech recognizer. In an automotive digits recognition task, the front end improves word error rate by 4.0% (25% relative to Mel cepstra) at -5 dB SNR at the cost of a 1.7% increase at 15 dB SNR.
引用
收藏
页码:249 / 252
页数:4
相关论文
共 50 条
  • [31] EXPLOITING SYNCHRONY SPECTRA AND DEEP NEURAL NETWORKS FOR NOISE-ROBUST AUTOMATIC SPEECH RECOGNITION
    Ma, Ning
    Marxer, Ricard
    Barker, Jon
    Brown, Guy J.
    [J]. 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 490 - 495
  • [32] An Efficient Noise-Robust Automatic Speech Recognition System using Artificial Neural Networks
    Gupta, Santosh
    Bhurchandi, Kishor M.
    Keskar, Avinash G.
    [J]. 2016 INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), VOL. 1, 2016, : 1873 - 1877
  • [33] An engineering model of the masking for the noise-robust speech recognition
    Park, KY
    Lee, SY
    [J]. NEUROCOMPUTING, 2003, 52-4 : 615 - 620
  • [34] Two-stage deep spectrum fusion for noise-robust end-to-end speech recognition
    Fan, Cunhang
    Ding, Mingming
    Yi, Jiangyan
    Li, Jinpeng
    Lv, Zhao
    [J]. APPLIED ACOUSTICS, 2023, 212
  • [35] Incorporating a Generative Front-end Layer to Deep Neural Network for Noise Robust Automatic Speech Recognition
    Kundu, Souvik
    Sim, Khe Chai
    Gales, Mark
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2359 - 2363
  • [36] Front-End Feature Compensation for Noise Robust Speech Emotion Recognition
    Pandharipande, Meghna
    Chakraborty, Rupayan
    Panda, Ashish
    Das, Biswajit
    Kopparapu, Sunil Kumar
    [J]. 2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [37] Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition
    Shimada, Kazuki
    Bando, Yoshiaki
    Mimura, Masato
    Itoyama, Katsutoshi
    Yoshii, Kazuyoshi
    Kawahara, Tatsuya
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (05) : 960 - 971
  • [38] Noise-robust speech recognition based on difference of power spectrum
    Xu, JF
    Wei, G
    [J]. ELECTRONICS LETTERS, 2000, 36 (14) : 1247 - 1248
  • [39] Deep Maxout Networks Applied to Noise-Robust Speech Recognition
    de-la-Calle-Silos, F.
    Gallardo-Antolin, A.
    Pelaez-Moreno, C.
    [J]. ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2014, 2014, 8854 : 109 - 118
  • [40] On the temporal decorrelation of feature parameters for noise-robust speech recognition
    Jung, HY
    Lee, SY
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (04): : 407 - 416