Fully Neural Network Based Speech Recognition on Mobile and Embedded Devices

被引:0
|
作者
Park, Jinhwan [1 ]
Boo, Yoonho [1 ]
Choi, Iksoo [1 ]
Shin, Sungho [1 ]
Sung, Wonyong [1 ]
机构
[1] Seoul Natl Univ, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Real-time automatic speech recognition (ASR) on mobile and embedded devices has been of great interests for many years. We present real-time speech recognition on smartphones or embedded systems by employing recurrent neural network (RNN) based acoustic models, RNN based language models, and beam-search decoding. The acoustic model is end-to-end trained with connectionist temporal classification (CTC) loss. The RNN implementation on embedded devices can suffer from excessive DRAM accesses because the parameter size of a neural network usually exceeds that of the cache memory and the parameters are used only once for each time step. To remedy this problem, we employ a multi-time step parallelization approach that computes multiple output samples at a time with the parameters fetched from the DRAM. Since the number of DRAM accesses can be reduced in proportion to the number of parallelization steps, we can achieve a high processing speed. However, conventional RNNs, such as long short-term memory (LSTM) or gated recurrent unit (GRU), do not permit multi-time step parallelization. We construct an acoustic model by combining simple recurrent units (SRUs) and depth-wise 1-dimensional convolution layers for multi-time step parallelization. Both the character and word piece models are developed for acoustic modeling, and the corresponding RNN based language models are used for beam search decoding. We achieve a competitive WER for WSJ corpus using the entire model size of around 15MB and achieve real-time speed using only a single core ARM without GPU or special hardware.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Primi Speech Recognition Based on Deep Neural Network
    Hu, Wenjun
    Fu, Meijun
    Pan, Wenlin
    2016 IEEE 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS (IS), 2016, : 667 - 671
  • [22] A Neural Network based on Sequence Learning for Speech Recognition
    Elmisery, Fathy A.
    Starzyk, Janusz A.
    ICCES: 2008 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS, 2007, : 139 - +
  • [23] The spiking neural network based on fMRI for speech recognition
    Song, Yihua
    Guo, Lei
    Man, Menghua
    Wu, Youxi
    PATTERN RECOGNITION, 2024, 155
  • [24] Noisy Speech Recognition Based On RBF Neural Network
    Yan Gang
    Kong Haidong
    Yu Yang
    Zheng Xiaoxia
    ADVANCED MATERIALS AND INFORMATION TECHNOLOGY PROCESSING, PTS 1-3, 2011, 271-273 : 597 - 602
  • [25] Speech Emotion Recognition Based on Deep Neural Network
    Zhu, Zijiang
    Hu, Yi
    Li, Junshan
    Li, Jianjun
    Wang, Junhua
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2020, 126 : 154 - 154
  • [26] Automatic Image and Speech Recognition Based on Neural Network
    Krol, Dariusz
    Szlachetko, Boguslaw
    JOURNAL OF INFORMATION TECHNOLOGY RESEARCH, 2010, 3 (02) : 1 - 17
  • [27] Speech Enhancement Method Based On LSTM Neural Network for Speech Recognition
    Liu, Ming
    Wang, Yujun
    Wang, Jin
    Wang, Jing
    Xie, Xiang
    PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2018, : 245 - 249
  • [28] Deep Neural Network Based Speech Separation for Robust Speech Recognition
    Tu Yanhui
    Jun, Du
    Xu Yong
    Dai Lirong
    Chin-Hui, Lee
    2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 532 - 536
  • [29] Neural Network-Based User-Independent Physical Activity Recognition for Mobile Devices
    Kolosnjaji, Bojan
    Eckert, Claudia
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2015, 2015, 9375 : 378 - 386
  • [30] Embedded palmprint recognition system on mobile devices
    Han, Yufei
    Tan, Tieniu
    Sun, Zhenan
    Hao, Ying
    ADVANCES IN BIOMETRICS, PROCEEDINGS, 2007, 4642 : 1184 - +