ON-DEVICE END-TO-END SPEECH RECOGNITION WITH MULTI-STEP PARALLEL RNNS

被引：0

作者：

Boo, Yoonho ^{[1
]}

Park, Jinhwan ^{[1
]}

Lee, Lukas ^{[1
]}

Sung, Wonyong ^{[1
]}

机构：

[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 08826, South Korea

来源：

2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018) | 2018年

基金：

新加坡国家研究基金会;

关键词：

End-to-end speech recognition; multi-step parallel RNN; personal devices;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Most of the current automatic speech recognition is performed on a remote server. However, the demand for speech recognition on personal devices is increasing, owing to the requirement of shorter recognition latency and increased privacy. End-to-end speech recognition that employs recurrent neural networks (RNNs) shows good accuracy, but the execution of conventional RNNs, such as the long short-term memory (LSTM) or gated recurrent unit (GRU), demands many memory accesses, thus hindering its real-time execution on smart-phones or embedded systems. To solve this problem, we built an end-to-end acoustic model (AM) using linear recurrent units instead of LSTM or GRU and employed a multi-step parallel approach for reducing the number of DRAM accesses. The AM is trained with the connectionist temporal classification (CTC) loss, and the decoding is conducted using weighted finite-state transducers (WFSTs). The proposed system achieves x4.8 real-time speed when executed on a single core of an ARM CPU-based system.

引用

页码：376 / 381

页数：6

共 50 条

[1] An Investigation Into On-device Personalization of End-to-end Automatic Speech Recognition Models
Sim, Khe Chai
Zadrazil, Petr
Beaufays, Francoise
INTERSPEECH 2019, 2019, : 774 - 778
[2] On-device Streaming Transformer-based End-to-End Speech Recognition
Oh, Yoo Rhee
Park, Kiyoung
INTERSPEECH 2021, 2021, : 967 - 968
[3] End-to-end adaptation with backpropagation through WFST for On-device speech recognition system
Tsunoo, Emiru
Kashiwagi, Yosuke
Asakawa, Satoshi
Kumakura, Toshiyuki
arXiv, 2019,
[4] A REVIEW OF ON-DEVICE FULLY NEURAL END-TO-END AUTOMATIC SPEECH RECOGNITION ALGORITHMS
Kim, Chanwoo
Gowda, Dhananjaya
Lee, Dongsoo
Kim, Jiyeon
Kumar, Ankur
Kim, Sungsoo
Garg, Abhinav
Han, Changwoo
2020 54TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2020, : 277 - 283
[5] End-to-end adaptation with backpropagation through WFST for on-device speech recognition system
Tsunoo, Emiru
Kashiwagi, Yosuke
Asakawa, Satoshi
Kumakura, Toshiyuki
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019, 2019-September : 764 - 768
[6] End-to-end Adaptation with Backpropagation through WFST for On-device Speech Recognition System
Tsunoo, Emiru
Kashiwagi, Yosuke
Asakawa, Satoshi
Kumakura, Toshiyuki
INTERSPEECH 2019, 2019, : 764 - 768
[7] Analyzing the Quality and Stability of a Streaming End-to-End On-Device Speech Recognizer
Shangguan, Yuan
Knister, Kate
He, Yanzhang
McGraw, Ian
Beaufays, Francoise
INTERSPEECH 2020, 2020, : 591 - 595
[8] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION
Settle, Shane
Le Roux, Jonathan
Hori, Takaaki
Watanabe, Shinji
Hershey, John R.
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4819 - 4823
[9] Multi-Stream End-to-End Speech Recognition
Li, Ruizhi
Wang, Xiaofei
Mallidi, Sri Harish
Watanabe, Shinji
Hori, Takaaki
Hermansky, Hynek
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (646-655) : 646 - 655
[10] Multi-Head Decoder for End-to-End Speech Recognition
Hayashi, Tomoki
Watanabe, Shinji
Toda, Tomoki
Takeda, Kazuya
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 801 - 805

← 1 2 3 4 5 →