Contextual Biasing for End-to-End Chinese ASR

被引:0
|
作者
Zhang, Kai [1 ]
Zhang, Qiuxia [2 ]
Wang, Chung-Che [2 ]
Jang, Jyh-Shing Roger [2 ]
机构
[1] Yiwu Ind & Commercial Coll, Yiwu 322000, Zhejiang, Peoples R China
[2] Natl Taiwan Univ, Dept Comp Sci & Informat Engn, Taipei 106, Taiwan
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Speech recognition; Task analysis; Hidden Markov models; Context modeling; Data models; Training data; Automatic speech recognition; Classification algorithms; Decoding; context biasing; intent classification; hotwords;
D O I
10.1109/ACCESS.2024.3424260
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The end-to-end speech recognition approach exhibits higher robustness compared to conventional methods, enhancing recognition accuracy across diverse contexts. However, due to the absence of an independent language model, it struggles to identify vocabulary beyond the training data, thus impacting the recognition of certain specific terms. Adapting to various scenarios necessitates a pivot towards specific domains. This study, based on the CATSLU dataset, constructed two tasks for Chinese contextual biasing, targeting both proper nouns and mixed-domain sentences. Additionally, it explored four methods of contextual biasing at different stages within the speech recognition process: pre-recognition, within the model, decoding, and post-processing stages. Experimental results indicate that all biasing methods to some extent improved the recognition efficacy of the speech recognition model within specific domains.
引用
收藏
页码:92960 / 92975
页数:16
相关论文
共 50 条
  • [41] AN INVESTIGATION OF MULTILINGUAL ASR USING END-TO-END LF-MMI
    Tong, Sibo
    Garner, Philip N.
    Bourlard, Herve
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6061 - 6065
  • [42] Exploring Targeted Universal Adversarial Perturbations to End-to-end ASR Models
    Lu, Zhiyun
    Han, Wei
    Zhang, Yu
    Cao, Langliang
    INTERSPEECH 2021, 2021, : 3460 - 3464
  • [43] TOWARDS CODE-SWITCHING ASR FOR END-TO-END CTC MODELS
    Li, Ke
    Li, Jinyu
    Ye, Guoli
    Zhao, Rui
    Gong, Yifan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6076 - 6080
  • [44] End-to-End ASR-Free Keyword Search From Speech
    Audhkhasi, Kartik
    Rosenberg, Andrew
    Sethy, Abhinav
    Ramabhadran, Bhuvana
    Kingsbury, Brian
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1351 - 1359
  • [45] BACK-TRANSLATION-STYLE DATA AUGMENTATION FOR END-TO-END ASR
    Hayashi, Tomoki
    Watanabe, Shinji
    Zhang, Yu
    Toda, Tomoki
    Hori, Takaaki
    Astudillo, Ramon
    Takeda, Kazuya
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 426 - 433
  • [46] ENDPOINT DETECTION FOR STREAMING END-TO-END MULTI-TALKER ASR
    Lu, Liang
    Li, Jinyu
    Gong, Yifan
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7312 - 7316
  • [47] Spelling-Aware Word-Based End-to-End ASR
    Egorova, Ekaterina
    Vydana, Hari Krishna
    Burget, Lukas
    Cernocky, Jan Honza
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1729 - 1733
  • [48] COMPARATIVE STUDY OF DIFFERENT TOKENIZATION STRATEGIES FOR STREAMING END-TO-END ASR
    Singh, Sachin
    Gupta, Ashutosh
    Maghan, Aman
    Gowda, Dhananjaya
    Singh, Shatrughan
    Kim, Chanwoo
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 388 - 394
  • [49] Semi-supervised ASR by End-to-end Self-training
    Chen, Yang
    Wang, Weiran
    Wang, Chao
    INTERSPEECH 2020, 2020, : 2787 - 2791
  • [50] END-TO-END MULTI-SPEAKER ASR WITH INDEPENDENT VECTOR ANALYSIS
    Scheibler, Robin
    Zhang, Wangyou
    Chang, Xuankai
    Watanabe, Shinji
    Qian, Yanmin
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 496 - 501