Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization

被引:0
|
作者
Yuma Ueda
Longbiao Wang
Atsuhiko Kai
Xiong Xiao
Eng Siong Chng
Haizhou Li
机构
[1] Shizuoka University,Graduate School of Engineering
[2] Nagaoka University of Technology,Temasek Laboratories @ NTU
[3] Nanyang Technological University,School of Computer Engineering
[4] Nanyang Technological University,Human Language Technology
[5] Institute for Infocomm Research,undefined
[6] A*STAR,undefined
来源
关键词
Speech recognition; Dereverberation; Denoising autoencoder; Environment adaptation; Distant-talking speech;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we propose a robust distant-talking speech recognition by combining cepstral domain denoising autoencoder (DAE) and temporal structure normalization (TSN) filter. As DAE has a deep structure and nonlinear processing steps, it is flexible enough to model highly nonlinear mapping between input and output space. In this paper, we train a DAE to map reverberant and noisy speech features to the underlying clean speech features in the cepstral domain. For the proposed method, after applying a DAE in the cepstral domain of speech to suppress reverberation, we apply a post-processing technology based on temporal structure normalization (TSN) filter to reduce the noise and reverberation effects by normalizing the modulation spectra to reference spectra of clean speech. The proposed method was evaluated using speech in simulated and real reverberant environments. By combining a cepstral-domain DAE and TSN, the average Word Error Rate (WER) was reduced from 25.2 % of the baseline system to 21.2 % in simulated environments and from 47.5 % to 41.3 % in real environments, respectively.
引用
收藏
页码:151 / 161
页数:10
相关论文
共 22 条
  • [1] Single-channel dereverberation for distant-talking speech recognition by combining denoising autoencoder and temporal structure normalization
    Ueda, Yuma
    Wang, Longbiao
    Kai, Atsuhiko
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 379 - +
  • [2] Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization
    Ueda, Yuma
    Wang, Longbiao
    Kai, Atsuhiko
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (02): : 151 - 161
  • [3] Environment-dependent denoising autoencoder for distant-talking speech recognition
    Ueda, Yuma
    Wang, Longbiao
    Kai, Atsuhiko
    Ren, Bo
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2015,
  • [4] Environment-dependent denoising autoencoder for distant-talking speech recognition
    Yuma Ueda
    Longbiao Wang
    Atsuhiko Kai
    Bo Ren
    EURASIP Journal on Advances in Signal Processing, 2015
  • [5] Denoising autoencoder and environment adaptation for distant-talking speech recognition with asynchronous speech recording
    Wang, Longbiao
    Ren, Bo
    Ueda, Yuma
    Kai, Atsuhiko
    Teraoka, Shunta
    Fukushima, Taku
    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
  • [6] Combination of bottleneck feature extraction and dereverberation for distant-talking speech recognition
    Bo Ren
    Longbiao Wang
    Liang Lu
    Yuma Ueda
    Atsuhiko Kai
    Multimedia Tools and Applications, 2016, 75 : 5093 - 5108
  • [7] Combination of bottleneck feature extraction and dereverberation for distant-talking speech recognition
    Ren, Bo
    Wang, Longbiao
    Lu, Liang
    Ueda, Yuma
    Kai, Atsuhiko
    MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (09) : 5093 - 5108
  • [8] MODEL-BASED DEREVERBERATION IN THE LOGMELSPEC DOMAIN FOR ROBUST DISTANT-TALKING SPEECH RECOGNITION
    Sehr, Armin
    Maas, Roland
    Kellermann, Walter
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4298 - 4301
  • [9] JOINT SPARSE REPRESENTATION BASED CEPSTRAL-DOMAIN DEREVERBERATION FOR DISTANT-TALKING SPEECH RECOGNITION
    Li, Weifeng
    Wang, Longbiao
    Zhou, Fei
    Liao, Qingmin
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7117 - 7120
  • [10] Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification
    Zhang, Zhaofeng
    Wang, Longbiao
    Kai, Atsuhiko
    Yamada, Takanori
    Li, Weifeng
    Iwahashi, Masahiro
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015,