Exploration of On-device End-to-End Acoustic Modeling with Neural Networks

被引:0
|
作者
Sung, Wonyong [1 ]
Lee, Lukas [1 ]
Park, Jinhwan [1 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
speech recognition; embedded systems; neural networks; multi-time step parallelization;
D O I
10.1109/sips47522.2019.9020317
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Real-time speech recognition on mobile and embedded devices is an important application of neural networks. Acoustic modeling is the fundamental part of speech recognition and is usually implemented with long short-term memory (LSTM)-based recurrent neural networks (RNNs). However, the single thread execution of an LSTM RNN is extremely slow in most embedded devices because the algorithm needs to fetch a large number of parameters from the DRAM for computing each output sample. We explore a few acoustic modeling algorithms that can be executed very efficiently on embedded devices. These algorithms reduce the overhead of memory accesses using multitime-step parallelization that computes multiple output samples at a time by reading the parameters only once from the DRAM. The algorithms considered are the quasi RNNs (QRNNs), Gated ConvNets, and diagonalized LSTMs. In addition, we explore neural networks that equip one-dimensional (1-D) convolution at each layer of these algorithms, and by which can obtain a very large performance increase in QRNNs and Gated ConvNets. The experiments were conducted using the connectionist temporal classification (CTC)-based end-to-end speech recognition on WSJ corpus. We not only significantly increase the execution speed but also obtain a much higher accuracy, compared to LSTM RNN-based modeling. Thus, this work can be applicable not only to embedded system-based implementations but also to server-based ones.
引用
收藏
页码:160 / 165
页数:6
相关论文
共 50 条
  • [21] Streaming on-device end-to-end ASR system for privacy-sensitive voice-typing
    Garg, Abhinav
    Vadisetti, Gowtham P.
    Gowda, Dhananjaya
    Jin, Sichen
    Jayasimha, Aditya
    Han, Youngho
    Kim, Jiyeon
    Park, Junmo
    Kim, Kwangyoun
    Kim, Sooyeon
    Lee, Young-yoon
    Min, Kyungbo
    Kim, Chanwoo
    INTERSPEECH 2020, 2020, : 3371 - 3375
  • [22] Virtual Molecular Projections and Convolutional Neural Networks for the End-to-End Modeling of Nanoparticle Activities and Properties
    Russo, Daniel P.
    Yan, Xiliang
    Shende, Sunil
    Huang, Heng
    Yan, Bing
    Zhu, Hao
    ANALYTICAL CHEMISTRY, 2020, 92 (20) : 13971 - 13979
  • [23] End-to-end learning of user equilibrium with implicit neural networks
    Liu, Zhichen
    Yin, Yafeng
    Bai, Fan
    Grimm, Donald K.
    TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2023, 150
  • [24] End-to-End Contextualized Document Indexing and Retrieval with Neural Networks
    Hofstaetter, Sebastian
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 2481 - 2481
  • [25] Segmental Recurrent Neural Networks for End-to-end Speech Recognition
    Lu, Liang
    Kong, Lingpeng
    Dyer, Chris
    Smith, Noah A.
    Renals, Steve
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 385 - 389
  • [26] DeepAttest: An End-to-End Attestation Framework for Deep Neural Networks
    Chen, Huili
    Fu, Cheng
    Rouhani, Bita Darvish
    Zhao, Jishen
    Koushanfar, Farinaz
    PROCEEDINGS OF THE 2019 46TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '19), 2019, : 487 - 498
  • [27] An End-to-End Compression Framework Based on Convolutional Neural Networks
    Jiang, Feng
    Tao, Wen
    Liu, Shaohui
    Ren, Jie
    Guo, Xun
    Zhao, Debin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (10) : 3007 - 3018
  • [28] Towards End-to-End Speech Recognition with Recurrent Neural Networks
    Graves, Alex
    Jaitly, Navdeep
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 1764 - 1772
  • [29] End-to-end risk budgeting portfolio optimization with neural networks
    Uysal, A. Sinem
    Li, Xiaoyue
    Mulvey, John M.
    ANNALS OF OPERATIONS RESEARCH, 2024, 339 (1-2) : 397 - 426
  • [30] END-TO-END OPTIMIZED SPEECH CODING WITH DEEP NEURAL NETWORKS
    Kankanahalli, Srihari
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2521 - 2525