Fully Neural Network Based Speech Recognition on Mobile and Embedded Devices

被引：0

作者：

Park, Jinhwan ^{[1
]}

Boo, Yoonho ^{[1
]}

Choi, Iksoo ^{[1
]}

Shin, Sungho ^{[1
]}

Sung, Wonyong ^{[1
]}

机构：

[1] Seoul Natl Univ, Seoul, South Korea

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018) | 2018年 / 31卷

基金：

新加坡国家研究基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Real-time automatic speech recognition (ASR) on mobile and embedded devices has been of great interests for many years. We present real-time speech recognition on smartphones or embedded systems by employing recurrent neural network (RNN) based acoustic models, RNN based language models, and beam-search decoding. The acoustic model is end-to-end trained with connectionist temporal classification (CTC) loss. The RNN implementation on embedded devices can suffer from excessive DRAM accesses because the parameter size of a neural network usually exceeds that of the cache memory and the parameters are used only once for each time step. To remedy this problem, we employ a multi-time step parallelization approach that computes multiple output samples at a time with the parameters fetched from the DRAM. Since the number of DRAM accesses can be reduced in proportion to the number of parallelization steps, we can achieve a high processing speed. However, conventional RNNs, such as long short-term memory (LSTM) or gated recurrent unit (GRU), do not permit multi-time step parallelization. We construct an acoustic model by combining simple recurrent units (SRUs) and depth-wise 1-dimensional convolution layers for multi-time step parallelization. Both the character and word piece models are developed for acoustic modeling, and the corresponding RNN based language models are used for beam search decoding. We achieve a competitive WER for WSJ corpus using the entire model size of around 15MB and achieve real-time speed using only a single core ARM without GPU or special hardware.

引用

页数：11

共 50 条

[21] Primi Speech Recognition Based on Deep Neural Network
Hu, Wenjun
Fu, Meijun
Pan, Wenlin
2016 IEEE 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS (IS), 2016, : 667 - 671
[22] A Neural Network based on Sequence Learning for Speech Recognition
Elmisery, Fathy A.
Starzyk, Janusz A.
ICCES: 2008 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS, 2007, : 139 - +
[23] The spiking neural network based on fMRI for speech recognition
Song, Yihua
Guo, Lei
Man, Menghua
Wu, Youxi
PATTERN RECOGNITION, 2024, 155
[24] Noisy Speech Recognition Based On RBF Neural Network
Yan Gang
Kong Haidong
Yu Yang
Zheng Xiaoxia
ADVANCED MATERIALS AND INFORMATION TECHNOLOGY PROCESSING, PTS 1-3, 2011, 271-273 : 597 - 602
[25] Speech Emotion Recognition Based on Deep Neural Network
Zhu, Zijiang
Hu, Yi
Li, Junshan
Li, Jianjun
Wang, Junhua
BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2020, 126 : 154 - 154
[26] Automatic Image and Speech Recognition Based on Neural Network
Krol, Dariusz
Szlachetko, Boguslaw
JOURNAL OF INFORMATION TECHNOLOGY RESEARCH, 2010, 3 (02) : 1 - 17
[27] Speech Enhancement Method Based On LSTM Neural Network for Speech Recognition
Liu, Ming
Wang, Yujun
Wang, Jin
Wang, Jing
Xie, Xiang
PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2018, : 245 - 249
[28] Deep Neural Network Based Speech Separation for Robust Speech Recognition
Tu Yanhui
Jun, Du
Xu Yong
Dai Lirong
Chin-Hui, Lee
2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 532 - 536
[29] Neural Network-Based User-Independent Physical Activity Recognition for Mobile Devices
Kolosnjaji, Bojan
Eckert, Claudia
INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2015, 2015, 9375 : 378 - 386
[30] Embedded palmprint recognition system on mobile devices
Han, Yufei
Tan, Tieniu
Sun, Zhenan
Hao, Ying
ADVANCES IN BIOMETRICS, PROCEEDINGS, 2007, 4642 : 1184 - +

← 1 2 3 4 5 →