ON-DEVICE END-TO-END SPEECH RECOGNITION WITH MULTI-STEP PARALLEL RNNS

被引:0
|
作者
Boo, Yoonho [1 ]
Park, Jinhwan [1 ]
Lee, Lukas [1 ]
Sung, Wonyong [1 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 08826, South Korea
基金
新加坡国家研究基金会;
关键词
End-to-end speech recognition; multi-step parallel RNN; personal devices;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most of the current automatic speech recognition is performed on a remote server. However, the demand for speech recognition on personal devices is increasing, owing to the requirement of shorter recognition latency and increased privacy. End-to-end speech recognition that employs recurrent neural networks (RNNs) shows good accuracy, but the execution of conventional RNNs, such as the long short-term memory (LSTM) or gated recurrent unit (GRU), demands many memory accesses, thus hindering its real-time execution on smart-phones or embedded systems. To solve this problem, we built an end-to-end acoustic model (AM) using linear recurrent units instead of LSTM or GRU and employed a multi-step parallel approach for reducing the number of DRAM accesses. The AM is trained with the connectionist temporal classification (CTC) loss, and the decoding is conducted using weighted finite-state transducers (WFSTs). The proposed system achieves x4.8 real-time speed when executed on a single core of an ARM CPU-based system.
引用
收藏
页码:376 / 381
页数:6
相关论文
共 50 条
  • [41] TRIGGERED ATTENTION FOR END-TO-END SPEECH RECOGNITION
    Moritz, Niko
    Hori, Takaaki
    Le Roux, Jonathan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5666 - 5670
  • [42] Performance Monitoring for End-to-End Speech Recognition
    Li, Ruizhi
    Sell, Gregory
    Hermansky, Hynek
    INTERSPEECH 2019, 2019, : 2245 - 2249
  • [43] End-to-End Speech Recognition and Disfluency Removal
    Lou, Paria Jamshid
    Johnson, Mark
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2051 - 2061
  • [44] End-to-End Speech Recognition in Agglutinative Languages
    Mamyrbayev, Orken
    Alimhan, Keylan
    Zhumazhanov, Bagashar
    Turdalykyzy, Tolganay
    Gusmanova, Farida
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2020), PT II, 2020, 12034 : 391 - 401
  • [45] End-to-end Korean Digits Speech Recognition
    Roh, Jong-hyuk
    Cho, Kwantae
    Kim, Youngsam
    Cho, Sangrae
    2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1137 - 1139
  • [46] MIMO-SPEECH: END-TO-END MULTI-CHANNEL MULTI-SPEAKER SPEECH RECOGNITION
    Chang, Xuankai
    Zhang, Wangyou
    Qian, Yanmin
    Le Roux, Jonathan
    Watanabe, Shinji
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 237 - 244
  • [47] SPEECH ENHANCEMENT USING END-TO-END SPEECH RECOGNITION OBJECTIVES
    Subramanian, Aswin Shanmugam
    Wang, Xiaofei
    Baskar, Murali Karthick
    Watanabe, Shinji
    Taniguchi, Toru
    Tran, Dung
    Fujita, Yuya
    2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2019, : 234 - 238
  • [48] End-to-End Speech Recognition Technology Based on Multi-Stream CNN
    Xiao, Hao
    Qiu, Yuan
    Fei, Rong
    Chen, Xiongbo
    Liu, Zuo
    Wu, Zongling
    2022 IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, 2022, : 1310 - 1315
  • [49] Multi-Level Modeling Units for End-to-End Mandarin Speech Recognition
    Yang, Yuting
    Du, Binbin
    Li, Yuke
    2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 175 - 179
  • [50] END-TO-END MULTI-PERSON AUDIO/VISUAL AUTOMATIC SPEECH RECOGNITION
    Braga, Otavio
    Makino, Takaki
    Siohan, Olivier
    Liao, Hank
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6994 - 6998