TOWARDS FAST AND ACCURATE STREAMING END-TO-END ASR

被引:0
|
作者
Li, Bo [1 ]
Chang, Shuo-yiin [1 ]
Sainath, Tara N. [1 ]
Pang, Ruoming [1 ]
He, Yanzhang [1 ]
Strohman, Trevor [1 ]
Wu, Yonghui [1 ]
机构
[1] Google LLC, Mountain View, CA 94043 USA
关键词
RNN-T; Endpointer; Latency;
D O I
10.1109/icassp40776.2020.9054715
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
End-to-end (E2E) models fold the acoustic, pronunciation and language models of a conventional speech recognition model into one neural network with a much smaller number of parameters than a conventional ASR system, thus making it suitable for on-device applications. For example, recurrent neural network transducer (RNNT) as a streaming E2E model has shown promising potential for on-device ASR [1]. For such applications, quality and latency are two critical factors. We propose to reduce E2E model's latency by extending the RNN-T endpointer (RNN-T EP) model [2] with additional early and late penalties. By further applying the minimum word error rate (MWER) training technique [3], we achieved 8.0% relative word error rate (WER) reduction and 130ms 90-percentile latency reduction over [2] on a Voice Search test set. We also experimented with a second-pass Listen, Attend and Spell (LAS) rescorer [4]. Although it did not directly improve the first pass latency, the large WER reduction provides extra room to trade WER for latency. RNN-T EP+LAS, together with MWER training brings in 18.7% relative WER reduction and 160ms 90-percentile latency reductions compared to the original proposed RNN-T EP [2] model.
引用
收藏
页码:6069 / 6073
页数:5
相关论文
共 50 条
  • [1] A BETTER AND FASTER END-TO-END MODEL FOR STREAMING ASR
    Li, Bo
    Gulati, Anmol
    Yu, Jiahui
    Sainath, Tara N.
    Chiu, Chung-Cheng
    Narayanan, Arun
    Chang, Shuo-Yiin
    Pang, Ruoming
    He, Yanzhang
    Qin, James
    Han, Wei
    Liang, Qiao
    Zhang, Yu
    Strohman, Trevor
    Wu, Yonghui
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5634 - 5638
  • [2] Towards Lifelong Learning of End-to-end ASR
    Chang, Heng-Jui
    Lee, Hung-yi
    Lee, Lin-shan
    INTERSPEECH 2021, 2021, : 2551 - 2555
  • [3] Multiple Softmax Architecture for Streaming Multilingual End-to-End ASR Systems
    Joshi, Vikas
    Das, Amit
    Sun, Eric
    Mehta, Rupesh R.
    Li, Jinyu
    Gong, Yifan
    INTERSPEECH 2021, 2021, : 1767 - 1771
  • [4] ENDPOINT DETECTION FOR STREAMING END-TO-END MULTI-TALKER ASR
    Lu, Liang
    Li, Jinyu
    Gong, Yifan
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7312 - 7316
  • [5] COMPARATIVE STUDY OF DIFFERENT TOKENIZATION STRATEGIES FOR STREAMING END-TO-END ASR
    Singh, Sachin
    Gupta, Ashutosh
    Maghan, Aman
    Gowda, Dhananjaya
    Singh, Shatrughan
    Kim, Chanwoo
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 388 - 394
  • [6] Towards a Livvi-Karelian End-to-End ASR System
    Kipyatkova, Irina
    Kagirov, Ildar
    Dolgushin, Mikhail
    Rodionova, Alexandra
    SPEECH AND COMPUTER, SPECOM 2024, PT I, 2025, 15299 : 57 - 68
  • [7] Streaming End-to-End ASR based on Blockwise Non-Autoregressive Models
    Wang, Tianzi
    Fujita, Yuya
    Chang, Xuankai
    Watanabe, Shinji
    INTERSPEECH 2021, 2021, : 3755 - 3759
  • [8] DOES SPEECH ENHANCEMENTWORK WITH END-TO-END ASR OBJECTIVES?: EXPERIMENTAL ANALYSIS OF MULTICHANNEL END-TO-END ASR
    Ochiai, Tsubasa
    Watanabe, Shinji
    Katagiri, Shigeru
    2017 IEEE 27TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2017,
  • [9] TOWARDS CODE-SWITCHING ASR FOR END-TO-END CTC MODELS
    Li, Ke
    Li, Jinyu
    Ye, Guoli
    Zhao, Rui
    Gong, Yifan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6076 - 6080
  • [10] Streaming End-to-End ASR Using CTC Decoder and DRA for Linguistic Information Substitution
    Takagi, Tatsunari
    Ogawa, Atsunori
    Kitaoka, Norihide
    Wakabayashi, Yukoh
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1779 - 1783