Exploration of On-device End-to-End Acoustic Modeling with Neural Networks

被引:0
|
作者
Sung, Wonyong [1 ]
Lee, Lukas [1 ]
Park, Jinhwan [1 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
speech recognition; embedded systems; neural networks; multi-time step parallelization;
D O I
10.1109/sips47522.2019.9020317
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Real-time speech recognition on mobile and embedded devices is an important application of neural networks. Acoustic modeling is the fundamental part of speech recognition and is usually implemented with long short-term memory (LSTM)-based recurrent neural networks (RNNs). However, the single thread execution of an LSTM RNN is extremely slow in most embedded devices because the algorithm needs to fetch a large number of parameters from the DRAM for computing each output sample. We explore a few acoustic modeling algorithms that can be executed very efficiently on embedded devices. These algorithms reduce the overhead of memory accesses using multitime-step parallelization that computes multiple output samples at a time by reading the parameters only once from the DRAM. The algorithms considered are the quasi RNNs (QRNNs), Gated ConvNets, and diagonalized LSTMs. In addition, we explore neural networks that equip one-dimensional (1-D) convolution at each layer of these algorithms, and by which can obtain a very large performance increase in QRNNs and Gated ConvNets. The experiments were conducted using the connectionist temporal classification (CTC)-based end-to-end speech recognition on WSJ corpus. We not only significantly increase the execution speed but also obtain a much higher accuracy, compared to LSTM RNN-based modeling. Thus, this work can be applicable not only to embedded system-based implementations but also to server-based ones.
引用
收藏
页码:160 / 165
页数:6
相关论文
共 50 条
  • [1] AN ATTENTION-BASED JOINT ACOUSTIC AND TEXT ON-DEVICE END-TO-END MODEL
    Sainath, Tara N.
    Pang, Ruoming
    Weiss, Ron J.
    He, Yanzhang
    Chiu, Chung-cheng
    Strohman, Trevor
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7039 - 7043
  • [2] A REVIEW OF ON-DEVICE FULLY NEURAL END-TO-END AUTOMATIC SPEECH RECOGNITION ALGORITHMS
    Kim, Chanwoo
    Gowda, Dhananjaya
    Lee, Dongsoo
    Kim, Jiyeon
    Kumar, Ankur
    Kim, Sungsoo
    Garg, Abhinav
    Han, Changwoo
    2020 54TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2020, : 277 - 283
  • [3] Analyzing the Quality and Stability of a Streaming End-to-End On-Device Speech Recognizer
    Shangguan, Yuan
    Knister, Kate
    He, Yanzhang
    McGraw, Ian
    Beaufays, Francoise
    INTERSPEECH 2020, 2020, : 591 - 595
  • [4] An Investigation Into On-device Personalization of End-to-end Automatic Speech Recognition Models
    Sim, Khe Chai
    Zadrazil, Petr
    Beaufays, Francoise
    INTERSPEECH 2019, 2019, : 774 - 778
  • [5] On-device Streaming Transformer-based End-to-End Speech Recognition
    Oh, Yoo Rhee
    Park, Kiyoung
    INTERSPEECH 2021, 2021, : 967 - 968
  • [6] MODELING NONLINEAR AUDIO EFFECTS WITH END-TO-END DEEP NEURAL NETWORKS
    Ramirez, Marco A. Martinez
    Reiss, Joshua D.
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 171 - 175
  • [7] End-to-end acoustic modeling using convolutional neural networks for HMM-based automatic speech recognition
    Palaz, Dimitri
    Magimai-Doss, Mathew
    Collobert, Ronan
    SPEECH COMMUNICATION, 2019, 108 : 15 - 32
  • [8] ON-DEVICE END-TO-END SPEECH RECOGNITION WITH MULTI-STEP PARALLEL RNNS
    Boo, Yoonho
    Park, Jinhwan
    Lee, Lukas
    Sung, Wonyong
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 376 - 381
  • [9] End-to-end adaptation with backpropagation through WFST for On-device speech recognition system
    Tsunoo, Emiru
    Kashiwagi, Yosuke
    Asakawa, Satoshi
    Kumakura, Toshiyuki
    arXiv, 2019,
  • [10] End-to-end adaptation with backpropagation through WFST for on-device speech recognition system
    Tsunoo, Emiru
    Kashiwagi, Yosuke
    Asakawa, Satoshi
    Kumakura, Toshiyuki
    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019, 2019-September : 764 - 768