HYBRID ACOUSTIC MODELS FOR DISTANT AND MULTICHANNEL LARGE VOCABULARY SPEECH RECOGNITION

被引:0
|
作者
Swietojanski, Pawel [1 ]
Ghoshal, Arnab [1 ]
Renals, Steve [1 ]
机构
[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9AB, Midlothian, Scotland
基金
英国工程与自然科学研究理事会;
关键词
Distant Speech Recognition; Deep Neural Networks; Microphone Arrays; Beamforming; Meeting recognition;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate the application of deep neural network (DNN)-hidden Markov model (HMM) hybrid acoustic models for far-field speech recognition of meetings recorded using microphone arrays. We show that the hybrid models achieve significantly better accuracy than conventional systems based on Gaussian mixture models (GMMs). We observe up to 8% absolute word error rate (WER) reduction from a discriminatively trained GMM baseline when using a single distant microphone, and between 4-6% absolute WER reduction when using beamforming on various combinations of array channels. By training the networks on audio from multiple channels, we find the networks can recover significant part of accuracy difference between the single distant microphone and beamformed configurations. Finally, we show that the accuracy of a network recognising speech from a single distant microphone can approach that of a multi-microphone setup by training with data from other microphones.
引用
收藏
页码:285 / 290
页数:6
相关论文
共 50 条
  • [31] PHMM BASED ASYNCHRONOUS ACOUSTIC MODEL FOR CHINESE LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
    Wu, Hao
    Wu, Xihong
    Chi, Huisheng
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4477 - 4480
  • [32] Discriminative training of Gaussian mixture models for large vocabulary speech recognition systems
    Bahl, LR
    Padmanabhan, M
    Nahamoo, D
    Gopalakrishnan, PS
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 613 - 616
  • [33] ADAPTABLE PHONEME-BASED MODELS FOR LARGE-VOCABULARY SPEECH RECOGNITION
    BAMBERG, PG
    MANDEL, MA
    SPEECH COMMUNICATION, 1991, 10 (5-6) : 437 - 451
  • [34] Segmental Encoder-Decoder Models for Large Vocabulary Automatic Speech Recognition
    Beck, Eugen
    Hannemann, Mirko
    Doetsch, Patrick
    Schlueter, Ralf
    Ney, Hermann
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 766 - 770
  • [35] Large Vocabulary Speech Recognition on Parallel Architectures
    Cardinal, Patrick
    Dumouchel, Pierre
    Boulianne, Gilles
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (11): : 2290 - 2300
  • [36] Croatian Large Vocabulary Automatic Speech Recognition
    Martincic-Ipsic, Sanda
    Pobar, Miran
    Ipsic, Ivo
    AUTOMATIKA, 2011, 52 (02) : 147 - 157
  • [37] Large-vocabulary speech recognition algorithms
    Padmanabhan, M
    Picheny, M
    COMPUTER, 2002, 35 (04) : 42 - +
  • [38] SPEECH RECOGNITION FOR LARGE-VOCABULARY SYSTEMS
    JACOB, B
    ANDREOBRECHT, R
    JOURNAL DE PHYSIQUE IV, 1994, 4 (C5): : 489 - 492
  • [39] Error identification for large vocabulary speech recognition
    Zhou, ZY
    Meng, H
    2004 International Symposium on Chinese Spoken Language Processing, Proceedings, 2004, : 21 - 24
  • [40] Using KL-based Acoustic Models in a Large Vocabulary Recognition Task
    Aradilla, Guillermo
    Bourlard, Herve
    Doss, Mathew Magimai
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 928 - 931