Exploration of On-device End-to-End Acoustic Modeling with Neural Networks

被引:0
|
作者
Sung, Wonyong [1 ]
Lee, Lukas [1 ]
Park, Jinhwan [1 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
speech recognition; embedded systems; neural networks; multi-time step parallelization;
D O I
10.1109/sips47522.2019.9020317
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Real-time speech recognition on mobile and embedded devices is an important application of neural networks. Acoustic modeling is the fundamental part of speech recognition and is usually implemented with long short-term memory (LSTM)-based recurrent neural networks (RNNs). However, the single thread execution of an LSTM RNN is extremely slow in most embedded devices because the algorithm needs to fetch a large number of parameters from the DRAM for computing each output sample. We explore a few acoustic modeling algorithms that can be executed very efficiently on embedded devices. These algorithms reduce the overhead of memory accesses using multitime-step parallelization that computes multiple output samples at a time by reading the parameters only once from the DRAM. The algorithms considered are the quasi RNNs (QRNNs), Gated ConvNets, and diagonalized LSTMs. In addition, we explore neural networks that equip one-dimensional (1-D) convolution at each layer of these algorithms, and by which can obtain a very large performance increase in QRNNs and Gated ConvNets. The experiments were conducted using the connectionist temporal classification (CTC)-based end-to-end speech recognition on WSJ corpus. We not only significantly increase the execution speed but also obtain a much higher accuracy, compared to LSTM RNN-based modeling. Thus, this work can be applicable not only to embedded system-based implementations but also to server-based ones.
引用
收藏
页码:160 / 165
页数:6
相关论文
共 50 条
  • [41] FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks
    Blott, Michaela
    Preusser, Thomas B.
    Fraser, Nicholas J.
    Gambardella, Giulio
    O'Brien, Kenneth
    Umuroglu, Yaman
    Leeser, Miriam
    Vissers, Kees
    ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2018, 11 (03)
  • [42] End-To-End Memory Networks
    Sukhbaatar, Sainbayar
    Szlam, Arthur
    Weston, Jason
    Fergus, Rob
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [43] End-to-end defect modeling
    Gras, L
    IEEE SOFTWARE, 2004, 21 (05) : 98 - 100
  • [44] END-TO-END ALEXA DEVICE ARBITRATION
    Barber, Jarred
    Fan, Yifeng
    Zhang, Tao
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 926 - 930
  • [45] Towards End-to-end Text Spotting with Convolutional Recurrent Neural Networks
    Li, Hui
    Wang, Peng
    Shen, Chunhua
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5248 - 5256
  • [46] Deep Neural Networks Based End-to-End DOA Estimation System
    Ando, Daniel Akira
    Kase, Yuya
    Nishimura, Toshihiko
    Sato, Takanori
    Ohganey, Takeo
    Ogawa, Yasutaka
    Hagiwara, Junichiro
    IEICE TRANSACTIONS ON COMMUNICATIONS, 2023, E106B (12) : 1350 - 1362
  • [47] Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks
    Zhang, Ying
    Pezeshki, Mohammad
    Brakel, Philemon
    Zhang, Saizheng
    Laurent, Cesar
    Bengio, Yoshua
    Courville, Aaron
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 410 - 414
  • [48] End-to-end Stereo Audio Coding Using Deep Neural Networks
    Lim, Wootaek
    Jang, Inseon
    Beack, Seungkwon
    Sung, Jongmo
    Lee, Taejin
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 860 - 864
  • [49] Convolutional Dictionary Learning by End-To-End Training of Iterative Neural Networks
    Kofler, Andreas
    Wald, Christian
    Schaeffter, Tobias
    Haltmeier, Markus
    Kolbitsch, Christoph
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 1213 - 1217
  • [50] Training neural networks end-to-end for hyperbox-based classification
    Martins, Denis Mayr Lima
    Luelf, Christian
    Gieseke, Fabian
    NEUROCOMPUTING, 2024, 599