A Study of the Recurrent Neural Network Encoder-Decoder for Large Vocabulary Speech Recognition

被引:0
|
作者
Lu, Liang [1 ]
Zhang, Xingxing [2 ]
Cho, Kyunghyun [3 ]
Renals, Steve [1 ]
机构
[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland
[2] Univ Edinburgh, Inst Language Cognit & Computat, Edinburgh, Midlothian, Scotland
[3] Univ Montreal, Montreal Inst Learning Algorithms, Montreal, PQ, Canada
基金
英国工程与自然科学研究理事会;
关键词
end-to-end speech recognition; deep neural networks; recurrent neural networks; encoder-decoder;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural networks have advanced the state-of-the-art in automatic speech recognition, when combined with hidden Markov models (HMMs). Recently there has been interest in using systems based on recurrent neural networks (RNNs) to perform sequence modelling directly, without the requirement of an HMM superstructure. In this paper, we study the RNN encoder-decoder approach for large vocabulary end-to-end speech recognition, whereby an encoder transforms a sequence of acoustic vectors into a sequence of feature representations, from which a decoder recovers a sequence of words. We investigated this approach on the Switchboard corpus using a training set of around 300 hours of transcribed audio data. Without the use of an explicit language model or pronunciation lexicon, we achieved promising recognition accuracy, demonstrating that this approach warrants further investigation.
引用
收藏
页码:3249 / 3253
页数:5
相关论文
共 50 条
  • [1] ON TRAINING THE RECURRENT NEURAL NETWORK ENCODER-DECODER FOR LARGE VOCABULARY END-TO-END SPEECH RECOGNITION
    Lu, Liang
    Zhang, Xingxing
    Renals, Steve
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5060 - 5064
  • [2] Segmental Encoder-Decoder Models for Large Vocabulary Automatic Speech Recognition
    Beck, Eugen
    Hannemann, Mirko
    Doetsch, Patrick
    Schlueter, Ralf
    Ney, Hermann
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 766 - 770
  • [3] Confidence measures in encoder-decoder models for speech recognition
    Woodward, Alejandro
    Bonnin, Clara
    Masuda, Issey
    Varas, David
    Bou-Balust, Elisenda
    Riveiro, Juan Carlos
    [J]. INTERSPEECH 2020, 2020, : 611 - 615
  • [4] Recurrent Neural Aligner: An Encoder-Decoder Neural Network Model for Sequence to Sequence Mapping
    Sak, Hasim
    Shannon, Matt
    Rao, Kanishka
    Beaufays, Francoise
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1298 - 1302
  • [5] Storm Surge Forecast Using an Encoder-Decoder Recurrent Neural Network Model
    Wei, Zhangping
    Nguyen, Hai Cong
    [J]. JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2022, 10 (12)
  • [6] Service Function Migration Scheduling based on Encoder-Decoder Recurrent Neural Network
    Hirayama, Takahiro
    Miyazawa, Takaya
    Jibiki, Masahiro
    Kafle, Ved P.
    [J]. PROCEEDINGS OF THE 2019 IEEE CONFERENCE ON NETWORK SOFTWARIZATION (NETSOFT 2019), 2019, : 193 - 197
  • [7] Bioinspired Encoder-Decoder Recurrent Neural Network with Attention for Hydroprocessing Unit Modeling
    Yang, Shu-Bo
    Moreira, Jesus
    Li, Zukui
    [J]. INDUSTRIAL & ENGINEERING CHEMISTRY RESEARCH, 2023, 62 (44) : 18526 - 18540
  • [8] Causal speech enhancement using dynamical-weighted loss and attention encoder-decoder recurrent neural network
    Peracha, Fahad Khalil
    Khattak, Muhammad Irfan M.
    Salem, Nema M.
    Saleem, Nasir M.
    [J]. PLOS ONE, 2023, 18 (05):
  • [9] A Recurrent Encoder-Decoder Network for Sequential Face Alignment
    Peng, Xi
    Feris, Rogerio S.
    Wang, Xiaoyu
    Metaxas, Dimitris N.
    [J]. COMPUTER VISION - ECCV 2016, PT I, 2016, 9905 : 38 - 56
  • [10] LARGE CONTEXT END-TO-END AUTOMATIC SPEECH RECOGNITION VIA EXTENSION OF HIERARCHICAL RECURRENT ENCODER-DECODER MODELS
    Masumura, Ryo
    Tanaka, Tomohiro
    Moriya, Takafumi
    Shinohara, Yusuke
    Oba, Takanobu
    Aono, Yushi
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5661 - 5665