PERSONALIZATION STRATEGIES FOR END-TO-END SPEECH RECOGNITION SYSTEMS

被引:11
|
作者
Gourav, Aditya [1 ]
Liu, Linda [1 ]
Gandhe, Ankur [1 ]
Gu, Yile [1 ]
Lan, Guitang [1 ]
Huang, Xiangyang [1 ]
Kalmane, Shashank [1 ]
Tiwari, Gautam [1 ]
Filimonov, Denis [1 ]
Rastrow, Ariya [1 ]
Stolcke, Andreas [1 ]
Bulyko, Ivan [1 ]
Alexa, Amazon [1 ]
机构
[1] Amazon Alexa, Seattle, WA 98109 USA
关键词
language modeling; automatic speech recognition; rescoring; shallow fusion; personalization;
D O I
10.1109/ICASSP39728.2021.9413962
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The recognition of personalized content, such as contact names, remains a challenging problem for end-to-end speech recognition systems. In this work, we demonstrate how first- and second-pass rescoring strategies can be leveraged together to improve the recognition of such words. Following previous work, we use a shallow fusion approach to bias towards recognition of personalized content in the first-pass decoding. We show that such an approach can improve personalized content recognition by up to 16% with minimum degradation on the general use case. We describe a fast and scalable algorithm that enables our biasing models to remain at the word-level, while applying the biasing at the subword level. This has the advantage of not requiring the biasing models to be dependent on any subword symbol table. We also describe a novel second-pass de-biasing approach: used in conjunction with a first-pass shallow fusion that optimizes on oracle WER, we can achieve an additional 14% improvement on personalized content recognition, and even improve accuracy for the general use case by up to 2.5%.
引用
收藏
页码:7348 / 7352
页数:5
相关论文
共 50 条
  • [21] An End-to-End model for Vietnamese speech recognition
    Van Huy Nguyen
    [J]. 2019 IEEE - RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF), 2019, : 307 - 312
  • [22] END-TO-END VISUAL SPEECH RECOGNITION WITH LSTMS
    Petridis, Stavros
    Li, Zuwei
    Pantic, Maja
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2592 - 2596
  • [23] SYNCHRONOUS TRANSFORMERS FOR END-TO-END SPEECH RECOGNITION
    Tian, Zhengkun
    Yi, Jiangyan
    Bai, Ye
    Tao, Jianhua
    Zhang, Shuai
    Wen, Zhengqi
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7884 - 7888
  • [24] End-to-End Speech Recognition For Arabic Dialects
    Seham Nasr
    Rehab Duwairi
    Muhannad Quwaider
    [J]. Arabian Journal for Science and Engineering, 2023, 48 : 10617 - 10633
  • [25] End-to-End Speech Recognition of Tamil Language
    Changrampadi, Mohamed Hashim
    Shahina, A.
    Narayanan, M. Badri
    Khan, A. Nayeemulla
    [J]. INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 32 (02): : 1309 - 1323
  • [26] PARAMETER UNCERTAINTY FOR END-TO-END SPEECH RECOGNITION
    Braun, Stefan
    Liu, Shih-Chii
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5636 - 5640
  • [27] Review of End-to-End Streaming Speech Recognition
    Wang, Aohui
    Zhang, Long
    Song, Wenyu
    Meng, Jie
    [J]. Computer Engineering and Applications, 2024, 59 (02) : 22 - 33
  • [28] End-to-End Speech Recognition For Arabic Dialects
    Nasr, Seham
    Duwairi, Rehab
    Quwaider, Muhannad
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (08) : 10617 - 10633
  • [29] TOWARDS END-TO-END UNSUPERVISED SPEECH RECOGNITION
    Liu, Alexander H.
    Hsu, Wei-Ning
    Auli, Michael
    Baevski, Alexei
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 221 - 228
  • [30] TRIGGERED ATTENTION FOR END-TO-END SPEECH RECOGNITION
    Moritz, Niko
    Hori, Takaaki
    Le Roux, Jonathan
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5666 - 5670