A Study of the Recurrent Neural Network Encoder-Decoder for Large Vocabulary Speech Recognition

被引：0

作者：

Lu, Liang ^{[1
]}

Zhang, Xingxing ^{[2
]}

Cho, Kyunghyun ^{[3
]}

Renals, Steve ^{[1
]}

机构：

[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland

[2] Univ Edinburgh, Inst Language Cognit & Computat, Edinburgh, Midlothian, Scotland

[3] Univ Montreal, Montreal Inst Learning Algorithms, Montreal, PQ, Canada

来源：

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年

基金：

英国工程与自然科学研究理事会;

关键词：

end-to-end speech recognition; deep neural networks; recurrent neural networks; encoder-decoder;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Deep neural networks have advanced the state-of-the-art in automatic speech recognition, when combined with hidden Markov models (HMMs). Recently there has been interest in using systems based on recurrent neural networks (RNNs) to perform sequence modelling directly, without the requirement of an HMM superstructure. In this paper, we study the RNN encoder-decoder approach for large vocabulary end-to-end speech recognition, whereby an encoder transforms a sequence of acoustic vectors into a sequence of feature representations, from which a decoder recovers a sequence of words. We investigated this approach on the Switchboard corpus using a training set of around 300 hours of transcribed audio data. Without the use of an explicit language model or pronunciation lexicon, we achieved promising recognition accuracy, demonstrating that this approach warrants further investigation.

引用

页码：3249 / 3253

页数：5

共 50 条

[31] Explainable gait recognition with prototyping encoder-decoder
Moon, Jucheol
Shin, Yong-Min
Park, Jin-Duk
Minaya, Nelson Hebert
Shin, Won-Yong
Choi, Sang-Il
[J]. PLOS ONE, 2022, 17 (03):
[32] A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction
Chollampatt, Shamil
Hwee Tou Ng
[J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 5755 - 5762
[33] Seismic Stratum Segmentation Using an Encoder-Decoder Convolutional Neural Network
Wang, Detao
Chen, Guoxiong
[J]. MATHEMATICAL GEOSCIENCES, 2021, 53 (06) : 1355 - 1374
[34] Multi-scale Recurrent Encoder-Decoder Network for Dense Temporal Classification
Choo, Sungkwon
Seo, Wonkyo
Jeong, Dong-Ju
Cho, Nam Ik
[J]. 2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 103 - 108
[35] DOM Refinement with neural Encoder-Decoder Networks
Metzger, Nando
[J]. PFG-JOURNAL OF PHOTOGRAMMETRY REMOTE SENSING AND GEOINFORMATION SCIENCE, 2020, 88 (3-4): : 362 - 363
[36] Exemplar Encoder-Decoder for Neural Conversation Generation
Pandey, Gaurav
Contractor, Danish
Kumar, Vineet
Joshi, Sachindra
[J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 1329 - 1338
[37] SPEECH-TO-SINGING CONVERSION IN AN ENCODER-DECODER FRAMEWORK
Parekh, Jayneel
Rao, Preeti
Yang, Yi-Hsuan
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 261 - 265
[38] Parameter estimation for WMTI-Watson model of white matter using encoder-decoder recurrent neural network
Diao, Yujian
Jelescu, Ileana
[J]. MAGNETIC RESONANCE IN MEDICINE, 2023, 89 (03) : 1193 - 1206
[39] The Deep Tensor Neural Network With Applications to Large Vocabulary Speech Recognition
Yu, Dong
Deng, Li
Seide, Frank
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (02): : 388 - 396
[40] Encoder-decoder network with RMP for tongue segmentation
Kusakunniran, Worapan
Borwarnginn, Punyanuch
Karnjanapreechakorn, Sarattha
Thongkanchorn, Kittikhun
Ritthipravat, Panrasee
Tuakta, Pimchanok
Benjapornlert, Paitoon
[J]. MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2023, 61 (05) : 1193 - 1207

← 1 2 3 4 5 →