SPEECH RECOGNITION IN UNSEEN AND NOISY CHANNEL CONDITIONS

被引:0
|
作者
Mitra, Vikramjit [1 ]
Franco, Horacio [1 ]
Bartels, Chris [1 ]
van Hout, Julien [1 ]
Graciarena, Martin [1 ]
Vergyri, Dimitra [1 ]
机构
[1] SRI Int, Speech Technol & Res Lab, 333 Ravenswood Ave, Menlo Pk, CA 94025 USA
关键词
automatic speech recognition; unsupervised adaptation; channel; and noise-robust speech recognition; auto-encoders; bottleneck features;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech recognition in varying background conditions is a challenging problem. Acoustic condition mismatch between training and evaluation data can significantly reduce recognition performance. For mismatched conditions, data-adaptation techniques are typically found to be useful, as they expose the acoustic model to the new data condition(s). Supervised adaptation techniques usually provide substantial performance improvement, but such gain is contingent on having labeled or transcribed data, which is often unavailable. The alternative is unsupervised adaptation, where feature-transform methods and model-adaptation techniques are typically explored. This work investigates robust features, feature-space maximum likelihood linear regression (fMLLR) transform, and deep convolutional nets to address the problem of unseen channel and noise conditions. In addition, the work investigates bottleneck (BN) features extracted from deep autoencoder (DAE) networks trained by using acoustic features extracted from the speech signal. We demonstrate that such representations not only produce robust systems but also that they can be used to perform data selection for unsupervised model adaptation. Our results indicate that the techniques presented in this paper significantly improve performance of speech recognition systems in unseen channel and noise conditions.
引用
收藏
页码:5215 / 5219
页数:5
相关论文
共 50 条
  • [41] OPTIMIZING SPECTRAL SUBTRACTION AND WIENER FILTERING FOR ROBUST SPEECH RECOGNITION IN REVERBERANT AND NOISY CONDITIONS
    Gomez, Randy
    Kawahara, Tatsuya
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4566 - 4569
  • [42] Mandarin speech recognition using segment-based cepstral comparison in noisy conditions
    Tung, SL
    Juang, YT
    [J]. ELECTRONICS LETTERS, 1996, 32 (17) : 1542 - 1543
  • [43] Development of a Vietnamese Large Vocabulary Continuous Speech Recognition System under Noisy Conditions
    Quoc Bao Nguyen
    Van Tuan Mai
    Quang Trung Le
    Ba Quyen Dam
    Van Hai Do
    [J]. PROCEEDINGS OF THE NINTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY (SOICT 2018), 2018, : 222 - 226
  • [44] Application of modified off-axis spectrum to speech recognition under noisy conditions
    Nakagaki, Atsushi
    Miyanaga, Yoshikazu
    Tochinai, Koji
    [J]. Electronics and Communications in Japan, Part III: Fundamental Electronic Science (English translation of Denshi Tsushin Gakkai Ronbunshi), 1992, 75 (03): : 102 - 110
  • [45] Speech recognition for noisy conditions based on discrete wavelet transform and parallel model combination
    Hu, CH
    Liu, XF
    [J]. ICEMI 2005: Conference Proceedings of the Seventh International Conference on Electronic Measurement & Instruments, Vol 1, 2005, : 408 - 411
  • [46] COMPARISON OF DIFFERENT SPEECH ENHANCEMENT METHODS ON RECOGNITION OF NOISY SPEECH
    AHMED, MS
    ALMARZOUG, AM
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 1994, 19 (01): : 45 - 56
  • [47] Combined Multi-channel NMF-based Robust Beamforming for Noisy Speech Recognition
    Mimura, Masato
    Bando, Yoshiaki
    Shimada, Kazuki
    Sakai, Shinsuke
    Yoshii, Kazuyoshi
    Kawahara, Tatsuya
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2451 - 2455
  • [48] Energy contour enhancement for noisy speech recognition
    Hwang, TH
    Chang, SC
    [J]. 2004 International Symposium on Chinese Spoken Language Processing, Proceedings, 2004, : 249 - 252
  • [49] Nanophotonic reservoir computing for noisy speech recognition
    Salehi, M. R.
    Dehyadegari, L.
    [J]. OPTICAL AND QUANTUM ELECTRONICS, 2016, 48 (05)
  • [50] Multisensory benefits for speech recognition in noisy environments
    Oh, Yonghee
    Schwalm, Meg
    Kalpin, Nicole
    [J]. FRONTIERS IN NEUROSCIENCE, 2022, 16