TOWARDS FAST AND ACCURATE STREAMING END-TO-END ASR

被引:0
|
作者
Li, Bo [1 ]
Chang, Shuo-yiin [1 ]
Sainath, Tara N. [1 ]
Pang, Ruoming [1 ]
He, Yanzhang [1 ]
Strohman, Trevor [1 ]
Wu, Yonghui [1 ]
机构
[1] Google LLC, Mountain View, CA 94043 USA
关键词
RNN-T; Endpointer; Latency;
D O I
10.1109/icassp40776.2020.9054715
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
End-to-end (E2E) models fold the acoustic, pronunciation and language models of a conventional speech recognition model into one neural network with a much smaller number of parameters than a conventional ASR system, thus making it suitable for on-device applications. For example, recurrent neural network transducer (RNNT) as a streaming E2E model has shown promising potential for on-device ASR [1]. For such applications, quality and latency are two critical factors. We propose to reduce E2E model's latency by extending the RNN-T endpointer (RNN-T EP) model [2] with additional early and late penalties. By further applying the minimum word error rate (MWER) training technique [3], we achieved 8.0% relative word error rate (WER) reduction and 130ms 90-percentile latency reduction over [2] on a Voice Search test set. We also experimented with a second-pass Listen, Attend and Spell (LAS) rescorer [4]. Although it did not directly improve the first pass latency, the large WER reduction provides extra room to trade WER for latency. RNN-T EP+LAS, together with MWER training brings in 18.7% relative WER reduction and 160ms 90-percentile latency reductions compared to the original proposed RNN-T EP [2] model.
引用
收藏
页码:6069 / 6073
页数:5
相关论文
共 50 条
  • [21] End-to-End Speaker-Attributed ASR with Transformer
    Kanda, Naoyuki
    Ye, Guoli
    Gaur, Yashesh
    Wang, Xiaofei
    Meng, Zhong
    Chen, Zhuo
    Yoshioka, Takuya
    [J]. INTERSPEECH 2021, 2021, : 4413 - 4417
  • [22] ASR-AWARE END-TO-END NEURAL DIARIZATION
    Khare, Aparna
    Han, Eunjung
    Yang, Yuguang
    Stolcke, Andreas
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8092 - 8096
  • [23] Improving Performance of End-to-End ASR on Numeric Sequences
    Peyser, Cal
    Zhang, Hao
    Sainath, Tara N.
    Wu, Zelin
    [J]. INTERSPEECH 2019, 2019, : 2185 - 2189
  • [24] SPEAKER AND LANGUAGE AWARE TRAINING FOR END-TO-END ASR
    Bansal, Shubham
    Malhotra, Karan
    Ganapathy, Sriram
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 494 - 501
  • [25] Improving Streaming End-to-End ASR on Transformer-based Causal Models with Encoder States Revision Strategies
    Li, Zehan
    Miao, Haoran
    Deng, Keqi
    Cheng, Gaofeng
    Tian, Sanli
    Li, Ta
    Yan, Yonghong
    [J]. INTERSPEECH 2022, 2022, : 1671 - 1675
  • [26] End-to-end stereoscopic video streaming system
    Pehlivan, Selen
    Aksay, Anil
    Bilen, Cagdas
    Akar, Gozde Bozdagi
    Civanlar, M. Reha
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 2169 - 2172
  • [27] Review of End-to-End Streaming Speech Recognition
    Wang, Aohui
    Zhang, Long
    Song, Wenyu
    Meng, Jie
    [J]. Computer Engineering and Applications, 2024, 59 (02) : 22 - 33
  • [28] End-to-end stereoscopic video streaming system
    Pehlivan, Selen
    Aksay, Anil
    Bilen, Cagdas
    Akar, Gozde Bozdagi
    Civanlar, M. Reha
    [J]. 2006 IEEE 14TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS, VOLS 1 AND 2, 2006, : 932 - +
  • [29] Stereoscopic Video Streaming with End-to-End Modeling
    Tan, A. Serdar
    Aksay, Anil
    Akar, Goezde Bozdagi
    Arikan, Erdal
    [J]. 2008 IEEE 16TH SIGNAL PROCESSING, COMMUNICATION AND APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2008, : 541 - +
  • [30] Multi-Modal Data Augmentation for End-to-End ASR
    Renduchintala, Adithya
    Ding, Shuoyang
    Wiesner, Matthew
    Watanabe, Shinji
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2394 - 2398