Robust Speech Recognition using Long Short-Term Memory Recurrent Neural Networks for Hybrid Acoustic Modelling

被引:0
|
作者
Geiger, Juergen T. [1 ]
Zhang, Zixing [1 ]
Weninger, Felix [1 ]
Schuller, Bjoern [1 ,2 ]
Rigoll, Gerhard [1 ]
机构
[1] Tech Univ Munich, Inst Human Machine Commun, Munich, Germany
[2] Imperial Coll London, Dept Comp, London, England
关键词
acoustic modelling; robust speech recognition; neural networks; long short-term memory;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One method to achieve robust speech recognition in adverse conditions including noise and reverberation is to employ acoustic modelling techniques involving neural networks. Using long short-term memory (LSTM) recurrent neural networks proved to be efficient for this task in a setup for phoneme prediction in a multi-stream GMM-HMM framework. These networks exploit a self-learnt amount of temporal context, which makes them especially suited for a noisy speech recognition task. One shortcoming of this approach is the necessity of a GMM acoustic model in the multi-stream framework. Furthermore, potential modelling power of the network is lost when predicting phonemes, compared to the classical hybrid setup where the network predicts HMM states. In this work, we propose to use LSTM networks in a hybrid HMM setup, in order to overcome these drawbacks. Experiments are performed using the medium-vocabulary recognition track of the 2nd CHiME challenge, containing speech utterances in a reverberant and noisy environment. A comparison of different network topologies for phoneme or state prediction used either in the hybrid or double-stream setup shows that state prediction networks perform better than networks predicting phonemes, leading to stateof-the-art results for this database.
引用
收藏
页码:631 / 635
页数:5
相关论文
共 50 条
  • [1] BIDIRECTIONAL QUATERNION LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORKS FOR SPEECH RECOGNITION
    Parcollet, Titouan
    Morchid, Mohamed
    Linares, Georges
    De Mori, Renato
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 8519 - 8523
  • [2] Long Short-Term Memory Networks for Noise Robust Speech Recognition
    Woellmer, Martin
    Sun, Yang
    Eyben, Florian
    Schuller, Bjoern
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2966 - 2969
  • [3] LOMBARD SPEECH SYNTHESIS USING LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORKS
    Bollepalli, Bajibabu
    Airaksinen, Manu
    Alku, Paavo
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5505 - 5509
  • [4] Long Short-Term Memory Recurrent Neural Network for Automatic Speech Recognition
    Oruh, Jane
    Viriri, Serestina
    Adegun, Adekanmi
    [J]. IEEE ACCESS, 2022, 10 : 30069 - 30079
  • [5] SPEECH ENHANCEMENT USING LONG SHORT-TERM MEMORY BASED RECURRENT NEURAL NETWORKS FOR NOISE ROBUST SPEAKER VERIFICATION
    Kolbaek, Morten
    Tan, Zheng-Hua
    Jensen, Jesper
    [J]. 2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 305 - 311
  • [6] Long Short-Term Memory based Convolutional Recurrent Neural Networks for Large Vocabulary Speech Recognition
    Li, Xiangang
    Wu, Xihong
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3219 - 3223
  • [7] Detecting Overlapping Speech with Long Short-Term Memory Recurrent Neural Networks
    Geiger, Juergen T.
    Eyben, Florian
    Schuller, Bjoern
    Rigoll, Gerhard
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1667 - 1671
  • [8] Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks
    Chen, Zhuo
    Watanabe, Shinji
    Erdogan, Hakan
    Hershey, John R.
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3274 - 3278
  • [9] CONSTRUCTING LONG SHORT-TERM MEMORY BASED DEEP RECURRENT NEURAL NETWORKS FOR LARGE VOCABULARY SPEECH RECOGNITION
    Li, Xianggang
    Wu, Xihong
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4520 - 4524
  • [10] Towards End-to-End Speech Recognition for Chinese Mandarin using Long Short-Term Memory Recurrent Neural Networks
    Li, Jie
    Zhang, Heng
    Cai, Xinyuan
    Xu, Bo
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3615 - 3619