ON-DEVICE END-TO-END SPEECH RECOGNITION WITH MULTI-STEP PARALLEL RNNS

被引:0
|
作者
Boo, Yoonho [1 ]
Park, Jinhwan [1 ]
Lee, Lukas [1 ]
Sung, Wonyong [1 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 08826, South Korea
基金
新加坡国家研究基金会;
关键词
End-to-end speech recognition; multi-step parallel RNN; personal devices;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most of the current automatic speech recognition is performed on a remote server. However, the demand for speech recognition on personal devices is increasing, owing to the requirement of shorter recognition latency and increased privacy. End-to-end speech recognition that employs recurrent neural networks (RNNs) shows good accuracy, but the execution of conventional RNNs, such as the long short-term memory (LSTM) or gated recurrent unit (GRU), demands many memory accesses, thus hindering its real-time execution on smart-phones or embedded systems. To solve this problem, we built an end-to-end acoustic model (AM) using linear recurrent units instead of LSTM or GRU and employed a multi-step parallel approach for reducing the number of DRAM accesses. The AM is trained with the connectionist temporal classification (CTC) loss, and the decoding is conducted using weighted finite-state transducers (WFSTs). The proposed system achieves x4.8 real-time speed when executed on a single core of an ARM CPU-based system.
引用
收藏
页码:376 / 381
页数:6
相关论文
共 50 条
  • [1] An Investigation Into On-device Personalization of End-to-end Automatic Speech Recognition Models
    Sim, Khe Chai
    Zadrazil, Petr
    Beaufays, Francoise
    INTERSPEECH 2019, 2019, : 774 - 778
  • [2] On-device Streaming Transformer-based End-to-End Speech Recognition
    Oh, Yoo Rhee
    Park, Kiyoung
    INTERSPEECH 2021, 2021, : 967 - 968
  • [3] End-to-end adaptation with backpropagation through WFST for On-device speech recognition system
    Tsunoo, Emiru
    Kashiwagi, Yosuke
    Asakawa, Satoshi
    Kumakura, Toshiyuki
    arXiv, 2019,
  • [4] A REVIEW OF ON-DEVICE FULLY NEURAL END-TO-END AUTOMATIC SPEECH RECOGNITION ALGORITHMS
    Kim, Chanwoo
    Gowda, Dhananjaya
    Lee, Dongsoo
    Kim, Jiyeon
    Kumar, Ankur
    Kim, Sungsoo
    Garg, Abhinav
    Han, Changwoo
    2020 54TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2020, : 277 - 283
  • [5] End-to-end adaptation with backpropagation through WFST for on-device speech recognition system
    Tsunoo, Emiru
    Kashiwagi, Yosuke
    Asakawa, Satoshi
    Kumakura, Toshiyuki
    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019, 2019-September : 764 - 768
  • [6] End-to-end Adaptation with Backpropagation through WFST for On-device Speech Recognition System
    Tsunoo, Emiru
    Kashiwagi, Yosuke
    Asakawa, Satoshi
    Kumakura, Toshiyuki
    INTERSPEECH 2019, 2019, : 764 - 768
  • [7] Analyzing the Quality and Stability of a Streaming End-to-End On-Device Speech Recognizer
    Shangguan, Yuan
    Knister, Kate
    He, Yanzhang
    McGraw, Ian
    Beaufays, Francoise
    INTERSPEECH 2020, 2020, : 591 - 595
  • [8] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION
    Settle, Shane
    Le Roux, Jonathan
    Hori, Takaaki
    Watanabe, Shinji
    Hershey, John R.
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4819 - 4823
  • [9] Multi-Stream End-to-End Speech Recognition
    Li, Ruizhi
    Wang, Xiaofei
    Mallidi, Sri Harish
    Watanabe, Shinji
    Hori, Takaaki
    Hermansky, Hynek
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (646-655) : 646 - 655
  • [10] Multi-Head Decoder for End-to-End Speech Recognition
    Hayashi, Tomoki
    Watanabe, Shinji
    Toda, Tomoki
    Takeda, Kazuya
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 801 - 805