Endpoint Detection using Grid Long Short-Term Memory Networks for Streaming Speech Recognition

被引:13
|
作者
Chang, Shuo-Yiin [1 ]
Li, Bo [1 ]
Sainath, Tara N. [1 ]
Simko, Gabor [1 ]
Parada, Carolina [1 ]
机构
[1] Google Inc, Mountain View, CA 94043 USA
关键词
D O I
10.21437/Interspeech.2017-284
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The task of endpointing is to determine when the user has finished speaking. This is important for interactive speech applications such as voice search and Google Home. In this paper, we propose a GLDNN-based (grid long short-term memory deep neural network) endpointer model and show that it provides significant improvements over a state-of-the-art CLDNN (convolutional, long short-term memory. deep neural network) model. Specifically. we replace the convolution layer in the CLDNN with a grid LSTM layer that models both spectral and temporal variations through recurrent connections. Results show that the GLDNN achieves 32% relative improvement in false alarm rate at a fixed false reject rate of 2%, and reduces median latency by 11%. We also include detailed experiments investigating why grid LSTMs offer better performance than convolution layers. Analysis reveals that the recurrent connection along the frequency axis is an important factor that greatly contributes to the performance of grid LSTMs, especially in the presence of background noise. Finally, we also show that multichannel input further increases robustness to background speech. Overall. we achieve 16% (100 ms) endpointer latency improvement relative to our previous best model on a Voice Search Task.
引用
收藏
页码:3812 / 3816
页数:5
相关论文
共 50 条
  • [1] Deep Long Short-Term Memory Networks for Speech Recognition
    Chien, Jen-Tzung
    Misbullah, Alim
    [J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [2] A PRIORITIZED GRID LONG SHORT-TERM MEMORY RNN FOR SPEECH RECOGNITION
    Hsu, Wei-Ning
    Zhang, Yu
    Glass, James
    [J]. 2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 467 - 473
  • [3] Long Short-Term Memory Networks for Noise Robust Speech Recognition
    Woellmer, Martin
    Sun, Yang
    Eyben, Florian
    Schuller, Bjoern
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2966 - 2969
  • [4] Modeling Speaker Variability Using Long Short-Term Memory Networks for Speech Recognition
    Li, Xiangang
    Wu, Xihong
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1086 - 1090
  • [5] Long Short-term Memory for Tibetan Speech Recognition
    Wang, Weizhe
    Chen, Ziyan
    Yang, Hongwu
    [J]. PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 1059 - 1063
  • [6] IMPROVING LONG SHORT-TERM MEMORY NETWORKS USING MAXOUT UNITS FOR LARGE VOCABULARY SPEECH RECOGNITION
    Li, Xiangang
    Wu, Xinhong
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4600 - 4604
  • [7] Speech Emotion Recognition for Indonesian Language Using Long Short-Term Memory
    Lasiman, Jeremia Jason
    Lestari, Dessi Puji
    [J]. 2018 INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL, INFORMATICS AND ITS APPLICATIONS (IC3INA), 2018, : 40 - 43
  • [8] Emotion Recognition From Speech and Text using Long Short-Term Memory
    Venkateswarlu, Sonagiri China
    Jeevakala, Siva Ramakrishna
    Kumar, Naluguru Udaya
    Munaswamy, Pidugu
    Pendyala, Dhanalaxmi
    [J]. ENGINEERING TECHNOLOGY & APPLIED SCIENCE RESEARCH, 2023, 13 (04) : 11166 - 11169
  • [9] BIDIRECTIONAL QUATERNION LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORKS FOR SPEECH RECOGNITION
    Parcollet, Titouan
    Morchid, Mohamed
    Linares, Georges
    De Mori, Renato
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 8519 - 8523
  • [10] Convolutional Grid Long Short-Term Memory Recurrent Neural Network for Automatic Speech Recognition
    Xue, Jiabin
    Zheng, Tieran
    Han, Jiqing
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2019, PT V, 2019, 1143 : 718 - 726