ON-DEVICE END-TO-END SPEECH RECOGNITION WITH MULTI-STEP PARALLEL RNNS

被引：0

作者：

Boo, Yoonho ^{[1
]}

Park, Jinhwan ^{[1
]}

Lee, Lukas ^{[1
]}

Sung, Wonyong ^{[1
]}

机构：

[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 08826, South Korea

来源：

2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018) | 2018年

基金：

新加坡国家研究基金会;

关键词：

End-to-end speech recognition; multi-step parallel RNN; personal devices;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Most of the current automatic speech recognition is performed on a remote server. However, the demand for speech recognition on personal devices is increasing, owing to the requirement of shorter recognition latency and increased privacy. End-to-end speech recognition that employs recurrent neural networks (RNNs) shows good accuracy, but the execution of conventional RNNs, such as the long short-term memory (LSTM) or gated recurrent unit (GRU), demands many memory accesses, thus hindering its real-time execution on smart-phones or embedded systems. To solve this problem, we built an end-to-end acoustic model (AM) using linear recurrent units instead of LSTM or GRU and employed a multi-step parallel approach for reducing the number of DRAM accesses. The AM is trained with the connectionist temporal classification (CTC) loss, and the decoding is conducted using weighted finite-state transducers (WFSTs). The proposed system achieves x4.8 real-time speed when executed on a single core of an ARM CPU-based system.

引用

页码：376 / 381

页数：6

共 50 条

[31] END-TO-END VISUAL SPEECH RECOGNITION WITH LSTMS
Petridis, Stavros
Li, Zuwei
Pantic, Maja
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2592 - 2596
[32] An End-to-End model for Vietnamese speech recognition
Van Huy Nguyen
2019 IEEE - RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF), 2019, : 307 - 312
[33] SYNCHRONOUS TRANSFORMERS FOR END-TO-END SPEECH RECOGNITION
Tian, Zhengkun
Yi, Jiangyan
Bai, Ye
Tao, Jianhua
Zhang, Shuai
Wen, Zhengqi
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7884 - 7888
[34] End-to-End Speech Recognition of Tamil Language
Changrampadi, Mohamed Hashim
Shahina, A.
Narayanan, M. Badri
Khan, A. Nayeemulla
INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 32 (02): : 1309 - 1323
[35] PARAMETER UNCERTAINTY FOR END-TO-END SPEECH RECOGNITION
Braun, Stefan
Liu, Shih-Chii
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5636 - 5640
[36] End-to-End Speech Recognition For Arabic Dialects
Seham Nasr
Rehab Duwairi
Muhannad Quwaider
Arabian Journal for Science and Engineering, 2023, 48 : 10617 - 10633
[37] Review of End-to-End Streaming Speech Recognition
Wang, Aohui
Zhang, Long
Song, Wenyu
Meng, Jie
Computer Engineering and Applications, 2024, 59 (02) : 22 - 33
[38] End-to-End Speech Recognition For Arabic Dialects
Nasr, Seham
Duwairi, Rehab
Quwaider, Muhannad
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (08) : 10617 - 10633
[39] TOWARDS END-TO-END UNSUPERVISED SPEECH RECOGNITION
Liu, Alexander H.
Hsu, Wei-Ning
Auli, Michael
Baevski, Alexei
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 221 - 228
[40] An Overview of End-to-End Automatic Speech Recognition
Wang, Dong
Wang, Xiaodong
Lv, Shaohe
SYMMETRY-BASEL, 2019, 11 (08):

← 1 2 3 4 5 →