Transfer learning for acoustic modeling of noise robust speech recognition

被引:0
|
作者
Yi J. [1 ,2 ]
Tao J. [1 ,2 ,3 ]
Liu B. [1 ]
Wen Z. [1 ]
机构
[1] National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing
[2] School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing
[3] CAS Center for Excellence in Brain Science and Intelligence Technology, Institute of Automation, Chinese Academy of Sciences, Beijing
关键词
Acoustic model; Deep neural network; Robust speech recognition; Transfer learning;
D O I
10.16511/j.cnki.qhdxxb.2018.21.001
中图分类号
学科分类号
摘要
Speech recognition in noisy environments was improved by using transfer learning to train acoustic models. The training of an acoustic model trained with noisy data (student model) is guided by an acoustic model trained with clean data (teacher model). This training process forces the posterior probability distribution of the student model to be close to the teacher model by minimizing the Kullback-Leibler (KL) divergence between the posterior probability distribution of the student model and that of the teacher model. Tests on the CHiME-2 dataset show that this method gives a 7.29% absolute average word error rate (WER) improvement over the baseline model and 3.92% absolute average WER improvement over the best CHiME-2 system. © 2018, Tsinghua University Press. All right reserved.
引用
收藏
页码:55 / 60
页数:5
相关论文
共 28 条
  • [1] Hinton G., Deng L., Yu D., Et al., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Processing Magazine, 29, 6, pp. 82-97, (2012)
  • [2] Graves A., Mohamed A.R., Hinton G., Speech recognition with deep recurrent neural networks, IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645-6649, (2013)
  • [3] Hasim S., Andrew S., Francoise B., Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition, Computer Science, 3, pp. 338-342, (2014)
  • [4] Xiong W., Droppo J., Huang X., Et al., The microsoft 2016 conversational speech recognition system
  • [5] Saon G., Sercu T., Rennie S., Et al., The IBM 2016 English conversational telephone speech recognition system
  • [6] Cai S., Jin X., Gao S.X., Et al., Noise robust speech recognition based on sub-band energy warping perception linear prediction coefficient, Chinese Journal of Acoustics, 6, pp. 667-672, (2012)
  • [7] Hu X.Y., Zou Y.X., Wang W.M., Robust noise feature compensation method for speech recognition based on missing data technology, Journal of Tsinghua University (Science and Technology), 6, pp. 753-756, (2013)
  • [8] Gales M.J.F., Pye D., Woodland P.C., Variance compensation within the MLLR framework for robust speech recognition and speaker adaptation, International Conference on Spoken Language, pp. 1832-1835, (1996)
  • [9] Siohan O., Chesta C., Lee C.H., Hidden Markov model adaptation using maximum a posteriori linear regression, Workshop on Robust Methods for Speech Recognition in Adverse Conditions, pp. 147-150, (1999)
  • [10] Tran D.T., Delroix M., Ogawa A., Et al., Factorized linear input network for acoustic model adaptation in noisy conditions, Conference of the International Speech Communication Association, pp. 3813-3817, (2016)