Fully Neural Network Based Speech Recognition on Mobile and Embedded Devices

被引:0
|
作者
Park, Jinhwan [1 ]
Boo, Yoonho [1 ]
Choi, Iksoo [1 ]
Shin, Sungho [1 ]
Sung, Wonyong [1 ]
机构
[1] Seoul Natl Univ, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Real-time automatic speech recognition (ASR) on mobile and embedded devices has been of great interests for many years. We present real-time speech recognition on smartphones or embedded systems by employing recurrent neural network (RNN) based acoustic models, RNN based language models, and beam-search decoding. The acoustic model is end-to-end trained with connectionist temporal classification (CTC) loss. The RNN implementation on embedded devices can suffer from excessive DRAM accesses because the parameter size of a neural network usually exceeds that of the cache memory and the parameters are used only once for each time step. To remedy this problem, we employ a multi-time step parallelization approach that computes multiple output samples at a time with the parameters fetched from the DRAM. Since the number of DRAM accesses can be reduced in proportion to the number of parallelization steps, we can achieve a high processing speed. However, conventional RNNs, such as long short-term memory (LSTM) or gated recurrent unit (GRU), do not permit multi-time step parallelization. We construct an acoustic model by combining simple recurrent units (SRUs) and depth-wise 1-dimensional convolution layers for multi-time step parallelization. Both the character and word piece models are developed for acoustic modeling, and the corresponding RNN based language models are used for beam search decoding. We achieve a competitive WER for WSJ corpus using the entire model size of around 15MB and achieve real-time speed using only a single core ARM without GPU or special hardware.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] MobiVSR : Efficient and Light-weight Neural Network for Visual Speech Recognition on Mobile Devices
    Shrivastava, Nilay
    Saxena, Astitwa
    Kumar, Yaman
    Shah, Rajiv Ratn
    Stent, Amanda
    Mahata, Debanjan
    Kaur, Preeti
    Zimmermann, Roger
    INTERSPEECH 2019, 2019, : 2753 - 2757
  • [2] Speech recognition for mobile devices
    Schmitt, Alexander
    Zaykovskiy, Dmitry
    Minker, Wolfgang
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2008, 11 (02) : 63 - 72
  • [3] Speech Recognition on Mobile Devices
    Tan, Zheng-Hua
    Lindberg, Borge
    MOBILE MULTIMEDIA PROCESSING: FUNDAMENTALS, METHODS, AND APPLICATIONS, 2010, 5960 : 221 - 237
  • [4] Joint Maximization Decoder with Neural Converters for Fully Neural Network-based Japanese Speech Recognition
    Moriya, Takafumi
    Wang, Jian
    Tanaka, Tomohiro
    Masumura, Ryo
    Shinohara, Yusuke
    Yamaguchi, Yoshikazu
    Aono, Yushi
    INTERSPEECH 2019, 2019, : 4410 - 4414
  • [5] Joint maximization decoder with neural converters for fully neural network-based Japanese speech recognition
    Moriya, Takafumi
    Wang, Jian
    Tanaka, Tomohiro
    Masumura, Ryo
    Shinohara, Yusuke
    Yamaguchi, Yoshikazu
    Aono, Yushi
    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019, 2019-September : 4410 - 4414
  • [6] Plants recognition using embedded Convolutional Neural Networks on Mobile devices
    Pechebovicz, Denise
    Premebida, Sthefanie
    Soares, Vinicios
    Camargo, Thiago
    Bittencourt, Jakson L.
    Baroncini, Virginia
    Martins, Marcella
    2020 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT), 2020, : 674 - 679
  • [7] Noise-robust speech recognition in mobile network based on convolution neural networks
    Lallouani Bouchakour
    Mohamed Debyeche
    International Journal of Speech Technology, 2022, 25 : 269 - 277
  • [8] Noise-robust speech recognition in mobile network based on convolution neural networks
    Bouchakour, Lallouani
    Debyeche, Mohamed
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2022, 25 (01) : 269 - 277
  • [9] PERSONALIZED SPEECH RECOGNITION ON MOBILE DEVICES
    McGraw, Ian
    Prabhavalkar, Rohit
    Alvarez, Raziel
    Arenas, Montse Gonzalez
    Rao, Kanishka
    Rybach, David
    Alsharif, Ouais
    Sak, Hasim
    Greenstein, Alexander
    Beaufays, Francoise
    Parada, Carolina
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5955 - 5959
  • [10] Speech Recognition for Mobile Devices at Google
    Schuster, Mike
    PRICAI 2010: TRENDS IN ARTIFICIAL INTELLIGENCE, 2010, 6230 : 8 - 10