A Study of the Recurrent Neural Network Encoder-Decoder for Large Vocabulary Speech Recognition

被引：0

作者：

Lu, Liang ^{[1
]}

Zhang, Xingxing ^{[2
]}

Cho, Kyunghyun ^{[3
]}

Renals, Steve ^{[1
]}

机构：

[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland

[2] Univ Edinburgh, Inst Language Cognit & Computat, Edinburgh, Midlothian, Scotland

[3] Univ Montreal, Montreal Inst Learning Algorithms, Montreal, PQ, Canada

来源：

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年

基金：

英国工程与自然科学研究理事会;

关键词：

end-to-end speech recognition; deep neural networks; recurrent neural networks; encoder-decoder;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Deep neural networks have advanced the state-of-the-art in automatic speech recognition, when combined with hidden Markov models (HMMs). Recently there has been interest in using systems based on recurrent neural networks (RNNs) to perform sequence modelling directly, without the requirement of an HMM superstructure. In this paper, we study the RNN encoder-decoder approach for large vocabulary end-to-end speech recognition, whereby an encoder transforms a sequence of acoustic vectors into a sequence of feature representations, from which a decoder recovers a sequence of words. We investigated this approach on the Switchboard corpus using a training set of around 300 hours of transcribed audio data. Without the use of an explicit language model or pronunciation lexicon, we achieved promising recognition accuracy, demonstrating that this approach warrants further investigation.

引用

页码：3249 / 3253

页数：5

共 50 条

[1] ON TRAINING THE RECURRENT NEURAL NETWORK ENCODER-DECODER FOR LARGE VOCABULARY END-TO-END SPEECH RECOGNITION
Lu, Liang
Zhang, Xingxing
Renals, Steve
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5060 - 5064
[2] Segmental Encoder-Decoder Models for Large Vocabulary Automatic Speech Recognition
Beck, Eugen
Hannemann, Mirko
Doetsch, Patrick
Schlueter, Ralf
Ney, Hermann
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 766 - 770
[3] Confidence measures in encoder-decoder models for speech recognition
Woodward, Alejandro
Bonnin, Clara
Masuda, Issey
Varas, David
Bou-Balust, Elisenda
Riveiro, Juan Carlos
[J]. INTERSPEECH 2020, 2020, : 611 - 615
[4] Recurrent Neural Aligner: An Encoder-Decoder Neural Network Model for Sequence to Sequence Mapping
Sak, Hasim
Shannon, Matt
Rao, Kanishka
Beaufays, Francoise
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1298 - 1302
[5] Storm Surge Forecast Using an Encoder-Decoder Recurrent Neural Network Model
Wei, Zhangping
Nguyen, Hai Cong
[J]. JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2022, 10 (12)
[6] Service Function Migration Scheduling based on Encoder-Decoder Recurrent Neural Network
Hirayama, Takahiro
Miyazawa, Takaya
Jibiki, Masahiro
Kafle, Ved P.
[J]. PROCEEDINGS OF THE 2019 IEEE CONFERENCE ON NETWORK SOFTWARIZATION (NETSOFT 2019), 2019, : 193 - 197
[7] Bioinspired Encoder-Decoder Recurrent Neural Network with Attention for Hydroprocessing Unit Modeling
Yang, Shu-Bo
Moreira, Jesus
Li, Zukui
[J]. INDUSTRIAL & ENGINEERING CHEMISTRY RESEARCH, 2023, 62 (44) : 18526 - 18540
[8] Causal speech enhancement using dynamical-weighted loss and attention encoder-decoder recurrent neural network
Peracha, Fahad Khalil
Khattak, Muhammad Irfan M.
Salem, Nema M.
Saleem, Nasir M.
[J]. PLOS ONE, 2023, 18 (05):
[9] A Recurrent Encoder-Decoder Network for Sequential Face Alignment
Peng, Xi
Feris, Rogerio S.
Wang, Xiaoyu
Metaxas, Dimitris N.
[J]. COMPUTER VISION - ECCV 2016, PT I, 2016, 9905 : 38 - 56
[10] LARGE CONTEXT END-TO-END AUTOMATIC SPEECH RECOGNITION VIA EXTENSION OF HIERARCHICAL RECURRENT ENCODER-DECODER MODELS
Masumura, Ryo
Tanaka, Tomohiro
Moriya, Takafumi
Shinohara, Yusuke
Oba, Takanobu
Aono, Yushi
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5661 - 5665

← 1 2 3 4 5 →