Fully Neural Network Based Speech Recognition on Mobile and Embedded Devices

被引：0

作者：

Park, Jinhwan ^{[1
]}

Boo, Yoonho ^{[1
]}

Choi, Iksoo ^{[1
]}

Shin, Sungho ^{[1
]}

Sung, Wonyong ^{[1
]}

机构：

[1] Seoul Natl Univ, Seoul, South Korea

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018) | 2018年 / 31卷

基金：

新加坡国家研究基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Real-time automatic speech recognition (ASR) on mobile and embedded devices has been of great interests for many years. We present real-time speech recognition on smartphones or embedded systems by employing recurrent neural network (RNN) based acoustic models, RNN based language models, and beam-search decoding. The acoustic model is end-to-end trained with connectionist temporal classification (CTC) loss. The RNN implementation on embedded devices can suffer from excessive DRAM accesses because the parameter size of a neural network usually exceeds that of the cache memory and the parameters are used only once for each time step. To remedy this problem, we employ a multi-time step parallelization approach that computes multiple output samples at a time with the parameters fetched from the DRAM. Since the number of DRAM accesses can be reduced in proportion to the number of parallelization steps, we can achieve a high processing speed. However, conventional RNNs, such as long short-term memory (LSTM) or gated recurrent unit (GRU), do not permit multi-time step parallelization. We construct an acoustic model by combining simple recurrent units (SRUs) and depth-wise 1-dimensional convolution layers for multi-time step parallelization. Both the character and word piece models are developed for acoustic modeling, and the corresponding RNN based language models are used for beam search decoding. We achieve a competitive WER for WSJ corpus using the entire model size of around 15MB and achieve real-time speed using only a single core ARM without GPU or special hardware.

引用

页数：11

共 50 条

[1] MobiVSR : Efficient and Light-weight Neural Network for Visual Speech Recognition on Mobile Devices
Shrivastava, Nilay
Saxena, Astitwa
Kumar, Yaman
Shah, Rajiv Ratn
Stent, Amanda
Mahata, Debanjan
Kaur, Preeti
Zimmermann, Roger
INTERSPEECH 2019, 2019, : 2753 - 2757
[2] Speech recognition for mobile devices
Schmitt, Alexander
Zaykovskiy, Dmitry
Minker, Wolfgang
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2008, 11 (02) : 63 - 72
[3] Speech Recognition on Mobile Devices
Tan, Zheng-Hua
Lindberg, Borge
MOBILE MULTIMEDIA PROCESSING: FUNDAMENTALS, METHODS, AND APPLICATIONS, 2010, 5960 : 221 - 237
[4] Joint Maximization Decoder with Neural Converters for Fully Neural Network-based Japanese Speech Recognition
Moriya, Takafumi
Wang, Jian
Tanaka, Tomohiro
Masumura, Ryo
Shinohara, Yusuke
Yamaguchi, Yoshikazu
Aono, Yushi
INTERSPEECH 2019, 2019, : 4410 - 4414
[5] Joint maximization decoder with neural converters for fully neural network-based Japanese speech recognition
Moriya, Takafumi
Wang, Jian
Tanaka, Tomohiro
Masumura, Ryo
Shinohara, Yusuke
Yamaguchi, Yoshikazu
Aono, Yushi
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019, 2019-September : 4410 - 4414
[6] Plants recognition using embedded Convolutional Neural Networks on Mobile devices
Pechebovicz, Denise
Premebida, Sthefanie
Soares, Vinicios
Camargo, Thiago
Bittencourt, Jakson L.
Baroncini, Virginia
Martins, Marcella
2020 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT), 2020, : 674 - 679
[7] Noise-robust speech recognition in mobile network based on convolution neural networks
Lallouani Bouchakour
Mohamed Debyeche
International Journal of Speech Technology, 2022, 25 : 269 - 277
[8] Noise-robust speech recognition in mobile network based on convolution neural networks
Bouchakour, Lallouani
Debyeche, Mohamed
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2022, 25 (01) : 269 - 277
[9] PERSONALIZED SPEECH RECOGNITION ON MOBILE DEVICES
McGraw, Ian
Prabhavalkar, Rohit
Alvarez, Raziel
Arenas, Montse Gonzalez
Rao, Kanishka
Rybach, David
Alsharif, Ouais
Sak, Hasim
Greenstein, Alexander
Beaufays, Francoise
Parada, Carolina
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5955 - 5959
[10] Speech Recognition for Mobile Devices at Google
Schuster, Mike
PRICAI 2010: TRENDS IN ARTIFICIAL INTELLIGENCE, 2010, 6230 : 8 - 10

← 1 2 3 4 5 →