EXPLORING NEURAL TRANSDUCERS FOR END-TO-END SPEECH RECOGNITION

被引：0

作者：

Battenberg, Eric ^{[1
]}

Chen, Jitong ^{[1
]}

Child, Rewon ^{[1
]}

Coates, Adam ^{[1
]}

Gaur, Yashesh ^{[1
]}

Li, Yi ^{[1
]}

Liu, Hairong ^{[1
]}

Satheesh, Sanjeev ^{[1
]}

Sriram, Anuroop ^{[1
]}

Zhu, Zhenyao ^{[1
]}

机构：

[1] Baidu Silicon Valley AI Lab, Sunnyvale, CA 94089 USA

来源：

2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU) | 2017年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this work, we perform an empirical comparison among the CTC, RNN-Transducer, and attention-based Seq2Seq models for end-to-end speech recognition. We show that, without any language model, Seq2Seq and RNN-Transducer models both outperform the best reported CTC models with a language model, on the popular Hub5'00 benchmark. On our internal diverse dataset, these trends continue - RNN-Transducer models rescored with a language model after beam search outperform our best CTC models. These results simplify the speech recognition pipeline so that decoding can now be expressed purely as neural network operations. We also study how the choice of encoder architecture affects the performance of the three models - when all encoder layers are forward only, and when encoders downsample the input representation aggressively.

引用

页码：206 / 213

页数：8

共 50 条

[1] Self-Attention Transducers for End-to-End Speech Recognition
Tian, Zhengkun
Yi, Jiangyan
Tao, Jianhua
Bai, Ye
Wen, Zhengqi
[J]. INTERSPEECH 2019, 2019, : 4395 - 4399
[2] Insights on Neural Representations for End-to-End Speech Recognition
Ollerenshaw, Anna
Jalal, Asif
Hain, Thomas
[J]. INTERSPEECH 2021, 2021, : 4079 - 4083
[3] End-to-End Neural Segmental Models for Speech Recognition
Tang, Hao
Lu, Liang
Kong, Lingpeng
Gimpel, Kevin
Livescu, Karen
Dyer, Chris
Smith, Noah A.
Renals, Steve
[J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1254 - 1264
[4] Minimum Latency Training of Sequence Transducers for Streaming End-to-End Speech Recognition
Shinohara, Yusuke
Watanabe, Shinji
[J]. INTERSPEECH 2022, 2022, : 2098 - 2102
[5] Segmental Recurrent Neural Networks for End-to-end Speech Recognition
Lu, Liang
Kong, Lingpeng
Dyer, Chris
Smith, Noah A.
Renals, Steve
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 385 - 389
[6] End-to-End Speech Emotion Recognition Based on Neural Network
Zhu, Bing
Zhou, Wenkai
Wang, Yutian
Wang, Hui
Cai, Juan Juan
[J]. 2017 17TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT 2017), 2017, : 1634 - 1638
[7] Towards End-to-End Speech Recognition with Recurrent Neural Networks
Graves, Alex
Jaitly, Navdeep
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 1764 - 1772
[8] ESPRESSO: A FAST END-TO-END NEURAL SPEECH RECOGNITION TOOLKIT
Wang, Yiming
Chen, Tongfei
Xu, Hainan
Ding, Shuoyang
Lv, Hang
Shao, Yiwen
Peng, Nanyun
Xie, Lei
Watanabe, Shinji
Khudanpur, Sanjeev
[J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 136 - 143
[9] Large-Scale Streaming End-to-End Speech Translation with Neural Transducers
Xue, Jian
Wang, Peidong
Li, Jinyu
Post, Matt
Gaur, Yashesh
[J]. INTERSPEECH 2022, 2022, : 3263 - 3267
[10] Exploring end-to-end framework towards Khasi speech recognition system
Bronson Syiem
L. Joyprakash Singh
[J]. International Journal of Speech Technology, 2021, 24 : 419 - 424

← 1 2 3 4 5 →