A Study of the Recurrent Neural Network Encoder-Decoder for Large Vocabulary Speech Recognition

被引:0
|
作者
Lu, Liang [1 ]
Zhang, Xingxing [2 ]
Cho, Kyunghyun [3 ]
Renals, Steve [1 ]
机构
[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland
[2] Univ Edinburgh, Inst Language Cognit & Computat, Edinburgh, Midlothian, Scotland
[3] Univ Montreal, Montreal Inst Learning Algorithms, Montreal, PQ, Canada
基金
英国工程与自然科学研究理事会;
关键词
end-to-end speech recognition; deep neural networks; recurrent neural networks; encoder-decoder;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural networks have advanced the state-of-the-art in automatic speech recognition, when combined with hidden Markov models (HMMs). Recently there has been interest in using systems based on recurrent neural networks (RNNs) to perform sequence modelling directly, without the requirement of an HMM superstructure. In this paper, we study the RNN encoder-decoder approach for large vocabulary end-to-end speech recognition, whereby an encoder transforms a sequence of acoustic vectors into a sequence of feature representations, from which a decoder recovers a sequence of words. We investigated this approach on the Switchboard corpus using a training set of around 300 hours of transcribed audio data. Without the use of an explicit language model or pronunciation lexicon, we achieved promising recognition accuracy, demonstrating that this approach warrants further investigation.
引用
收藏
页码:3249 / 3253
页数:5
相关论文
共 50 条
  • [31] Explainable gait recognition with prototyping encoder-decoder
    Moon, Jucheol
    Shin, Yong-Min
    Park, Jin-Duk
    Minaya, Nelson Hebert
    Shin, Won-Yong
    Choi, Sang-Il
    [J]. PLOS ONE, 2022, 17 (03):
  • [32] A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction
    Chollampatt, Shamil
    Hwee Tou Ng
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 5755 - 5762
  • [33] Seismic Stratum Segmentation Using an Encoder-Decoder Convolutional Neural Network
    Wang, Detao
    Chen, Guoxiong
    [J]. MATHEMATICAL GEOSCIENCES, 2021, 53 (06) : 1355 - 1374
  • [34] Multi-scale Recurrent Encoder-Decoder Network for Dense Temporal Classification
    Choo, Sungkwon
    Seo, Wonkyo
    Jeong, Dong-Ju
    Cho, Nam Ik
    [J]. 2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 103 - 108
  • [35] DOM Refinement with neural Encoder-Decoder Networks
    Metzger, Nando
    [J]. PFG-JOURNAL OF PHOTOGRAMMETRY REMOTE SENSING AND GEOINFORMATION SCIENCE, 2020, 88 (3-4): : 362 - 363
  • [36] Exemplar Encoder-Decoder for Neural Conversation Generation
    Pandey, Gaurav
    Contractor, Danish
    Kumar, Vineet
    Joshi, Sachindra
    [J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 1329 - 1338
  • [37] SPEECH-TO-SINGING CONVERSION IN AN ENCODER-DECODER FRAMEWORK
    Parekh, Jayneel
    Rao, Preeti
    Yang, Yi-Hsuan
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 261 - 265
  • [38] Parameter estimation for WMTI-Watson model of white matter using encoder-decoder recurrent neural network
    Diao, Yujian
    Jelescu, Ileana
    [J]. MAGNETIC RESONANCE IN MEDICINE, 2023, 89 (03) : 1193 - 1206
  • [39] The Deep Tensor Neural Network With Applications to Large Vocabulary Speech Recognition
    Yu, Dong
    Deng, Li
    Seide, Frank
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (02): : 388 - 396
  • [40] Encoder-decoder network with RMP for tongue segmentation
    Kusakunniran, Worapan
    Borwarnginn, Punyanuch
    Karnjanapreechakorn, Sarattha
    Thongkanchorn, Kittikhun
    Ritthipravat, Panrasee
    Tuakta, Pimchanok
    Benjapornlert, Paitoon
    [J]. MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2023, 61 (05) : 1193 - 1207