PERSONALIZATION STRATEGIES FOR END-TO-END SPEECH RECOGNITION SYSTEMS

被引:11
|
作者
Gourav, Aditya [1 ]
Liu, Linda [1 ]
Gandhe, Ankur [1 ]
Gu, Yile [1 ]
Lan, Guitang [1 ]
Huang, Xiangyang [1 ]
Kalmane, Shashank [1 ]
Tiwari, Gautam [1 ]
Filimonov, Denis [1 ]
Rastrow, Ariya [1 ]
Stolcke, Andreas [1 ]
Bulyko, Ivan [1 ]
Alexa, Amazon [1 ]
机构
[1] Amazon Alexa, Seattle, WA 98109 USA
关键词
language modeling; automatic speech recognition; rescoring; shallow fusion; personalization;
D O I
10.1109/ICASSP39728.2021.9413962
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The recognition of personalized content, such as contact names, remains a challenging problem for end-to-end speech recognition systems. In this work, we demonstrate how first- and second-pass rescoring strategies can be leveraged together to improve the recognition of such words. Following previous work, we use a shallow fusion approach to bias towards recognition of personalized content in the first-pass decoding. We show that such an approach can improve personalized content recognition by up to 16% with minimum degradation on the general use case. We describe a fast and scalable algorithm that enables our biasing models to remain at the word-level, while applying the biasing at the subword level. This has the advantage of not requiring the biasing models to be dependent on any subword symbol table. We also describe a novel second-pass de-biasing approach: used in conjunction with a first-pass shallow fusion that optimizes on oracle WER, we can achieve an additional 14% improvement on personalized content recognition, and even improve accuracy for the general use case by up to 2.5%.
引用
收藏
页码:7348 / 7352
页数:5
相关论文
共 50 条
  • [1] PERSONALIZATION OF END-TO-END SPEECH RECOGNITION ON MOBILE DEVICES FOR NAMED ENTITIES
    Sim, Khe Chai
    Beaufays, Francoise
    Guliani, Arnaud Benard Dhruv
    Kabel, Andreas
    Khare, Nikhil
    Lucassen, Tamar
    Zadrazil, Petr
    Zhang, Harry
    Johnson, Leif
    Motta, Giovanni
    Zhou, Lillian
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 23 - 30
  • [2] An Investigation Into On-device Personalization of End-to-end Automatic Speech Recognition Models
    Sim, Khe Chai
    Zadrazil, Petr
    Beaufays, Francoise
    [J]. INTERSPEECH 2019, 2019, : 774 - 778
  • [3] The state of end-to-end systems for Mexican Spanish speech recognition
    Hernandez-Mena, Carlos Daniel
    Ruiz, Ivan Vladimir Meza
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2023, (70): : 135 - 144
  • [4] Arabic speech recognition by end-to-end, modular systems and human
    Hussein, Amir
    Watanabe, Shinji
    Ali, Ahmed
    [J]. COMPUTER SPEECH AND LANGUAGE, 2022, 71
  • [5] Improved training for online end-to-end speech recognition systems
    Kim, Suyoun
    Seltzer, Michael L.
    Li, Jinyu
    Zhao, Rui
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2913 - 2917
  • [6] END-TO-END MULTIMODAL SPEECH RECOGNITION
    Palaskar, Shruti
    Sanabria, Ramon
    Metze, Florian
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5774 - 5778
  • [7] End-to-End Speech Recognition in Russian
    Markovnikov, Nikita
    Kipyatkova, Irina
    Lyakso, Elena
    [J]. SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 377 - 386
  • [8] EXPLORING MODEL UNITS AND TRAINING STRATEGIES FOR END-TO-END SPEECH RECOGNITION
    Huang, Mingkun
    Lu, Yizhou
    Wang, Lan
    Qian, Yanmin
    Yu, Kai
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 524 - 531
  • [9] Overview of end-to-end speech recognition
    Wang, Song
    Li, Guanyu
    [J]. 2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
  • [10] Multichannel End-to-end Speech Recognition
    Ochiai, Tsubasa
    Watanabe, Shinji
    Hori, Takaaki
    Hershey, John R.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70