BIFOCAL NEURAL ASR: EXPLOITING KEYWORD SPOTTING FOR INFERENCE OPTIMIZATION

被引:7
|
作者
Macoskey, Jon [1 ]
Strimel, Grant P. [1 ]
Rastrow, Ariya [1 ]
机构
[1] Amazoncom, Seattle, WA 98109 USA
关键词
On-device speech recognition; recurrent neural network transducer (RNN-T); inference optimization;
D O I
10.1109/ICASSP39728.2021.9414652
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present Bifocal RNN-T, a new variant of the Recurrent Neural Network Transducer (RNN-T) architecture designed for improved inference time latency on speech recognition tasks. The architecture enables a dynamic pivot for its runtime compute pathway, namely taking advantage of keyword spotting to select which component of the network to execute for a given audio frame. To accomplish this, we leverage a recurrent cell we call the Bifocal LSTM (BF-LSTM), which we detail in the paper. The architecture is compatible with other optimization strategies such as quantization, sparsification, and applying time-reduction layers, making it especially applicable for deployed, real-time speech recognition settings. We present the architecture and report comparative experimental results on voice-assistant speech recognition tasks. Specifically, we show our proposed Bifocal RNN-T can improve inference cost by 29.1% with matching word error rates and only a minor increase in memory size.
引用
收藏
页码:5999 / 6003
页数:5
相关论文
共 50 条
  • [1] Generalized Keyword Spotting using ASR embeddings
    Kirandevraj, R.
    Kurmi, Vinod K.
    Namboodiri, Vinay P.
    Jawahar, C. V.
    [J]. INTERSPEECH 2022, 2022, : 126 - 130
  • [2] Neural Architecture Search For Keyword Spotting
    Mo, Tong
    Yu, Yakun
    Salameh, Mohammad
    Niu, Di
    Jui, Shangling
    [J]. INTERSPEECH 2020, 2020, : 1982 - 1986
  • [3] Keyword spotting exploiting Long Short-Term Memory
    Woellmer, Martin
    Schuller, Bjoern
    Rigoll, Gerhard
    [J]. SPEECH COMMUNICATION, 2013, 55 (02) : 252 - 265
  • [4] Keyword spotting based on recurrent neural network
    Zhou, JL
    Liu, J
    Song, YT
    Yu, TC
    [J]. ICSP '98: 1998 FOURTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1998, : 710 - 713
  • [5] Neural keyword confidence estimation for open-vocabulary keyword spotting
    Liu, Zuozhen
    Li, Ta
    Zhang, Pengyuan
    [J]. ELECTRONICS LETTERS, 2022, 58 (03) : 133 - 135
  • [6] Exploiting Phoneme Similarities in Hybrid HMM-ANN Keyword Spotting
    Pinto, Joel
    Lovitt, Andrew
    Hermansky, Hynek
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2388 - 2391
  • [7] STOCHASTIC ADAPTIVE NEURAL ARCHITECTURE SEARCH FOR KEYWORD SPOTTING
    Veniat, Tom
    Schwander, Olivier
    Denoyer, Ludovic
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 2842 - 2846
  • [8] An application of recurrent neural networks to discriminative keyword spotting
    Fernandez, Santiago
    Graves, Alex
    Schmidhuber, Juergen
    [J]. ARTIFICIAL NEURAL NETWORKS - ICANN 2007, PT 2, PROCEEDINGS, 2007, 4669 : 220 - +
  • [9] Deep Convolutional Spiking Neural Networks for Keyword Spotting
    Yilmaz, Emre
    Gevrek, Ozgur Bora
    Wu, Jibin
    Chen, Yuxiang
    Meng, Xuanbo
    Li, Haizhou
    [J]. INTERSPEECH 2020, 2020, : 2557 - 2561
  • [10] Neural Network Exploration for Keyword Spotting on Edge Devices
    Bushur, Jacob
    Chen, Chao
    [J]. FUTURE INTERNET, 2023, 15 (06)