Contextual Biasing for End-to-End Chinese ASR

被引:0
|
作者
Zhang, Kai [1 ]
Zhang, Qiuxia [2 ]
Wang, Chung-Che [2 ]
Jang, Jyh-Shing Roger [2 ]
机构
[1] Yiwu Ind & Commercial Coll, Yiwu 322000, Zhejiang, Peoples R China
[2] Natl Taiwan Univ, Dept Comp Sci & Informat Engn, Taipei 106, Taiwan
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Speech recognition; Task analysis; Hidden Markov models; Context modeling; Data models; Training data; Automatic speech recognition; Classification algorithms; Decoding; context biasing; intent classification; hotwords;
D O I
10.1109/ACCESS.2024.3424260
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The end-to-end speech recognition approach exhibits higher robustness compared to conventional methods, enhancing recognition accuracy across diverse contexts. However, due to the absence of an independent language model, it struggles to identify vocabulary beyond the training data, thus impacting the recognition of certain specific terms. Adapting to various scenarios necessitates a pivot towards specific domains. This study, based on the CATSLU dataset, constructed two tasks for Chinese contextual biasing, targeting both proper nouns and mixed-domain sentences. Additionally, it explored four methods of contextual biasing at different stages within the speech recognition process: pre-recognition, within the model, decoding, and post-processing stages. Experimental results indicate that all biasing methods to some extent improved the recognition efficacy of the speech recognition model within specific domains.
引用
下载
收藏
页码:92960 / 92975
页数:16
相关论文
共 50 条
  • [1] Class LM and Word Mapping for Contextual Biasing in End-to-End ASR
    Huang, Rongqing
    Abdel-hamid, Ossama
    Li, Xinwei
    Evermann, Gunnar
    INTERSPEECH 2020, 2020, : 4348 - 4351
  • [2] NAM plus : TOWARDS SCALABLE END-TO-END CONTEXTUAL BIASING FOR ADAPTIVE ASR
    Munkhdalai, Tsendsuren
    Wu, Zelin
    Pundak, Golan
    Sim, Khe Chai
    Li, Jiayang
    Rondon, Pat
    Sainath, Tara N.
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 190 - 196
  • [3] Shallow-Fusion End-to-End Contextual Biasing
    Zhao, Ding
    Sainath, Tara N.
    Rybach, David
    Rondon, Pat
    Bhatia, Deepti
    Li, Bo
    Pang, Ruoming
    INTERSPEECH 2019, 2019, : 1418 - 1422
  • [4] Joint Grapheme and Phoneme Embeddings for Contextual End-to-End ASR
    Chen, Zhehuai
    Jain, Mahaveer
    Wang, Yongqiang
    Seltzer, Michael L.
    Fuegen, Christian
    INTERSPEECH 2019, 2019, : 3490 - 3494
  • [5] DOES SPEECH ENHANCEMENTWORK WITH END-TO-END ASR OBJECTIVES?: EXPERIMENTAL ANALYSIS OF MULTICHANNEL END-TO-END ASR
    Ochiai, Tsubasa
    Watanabe, Shinji
    Katagiri, Shigeru
    2017 IEEE 27TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2017,
  • [6] Timestamp-aligning and keyword-biasing end-to-end ASR front-end for a KWS system
    Gui-Xin Shi
    Wei-Qiang Zhang
    Guan-Bo Wang
    Jing Zhao
    Shu-Zhou Chai
    Ze-Yu Zhao
    EURASIP Journal on Audio, Speech, and Music Processing, 2021
  • [7] Timestamp-aligning and keyword-biasing end-to-end ASR front-end for a KWS system
    Shi, Gui-Xin
    Zhang, Wei-Qiang
    Wang, Guan-Bo
    Zhao, Jing
    Chai, Shu-Zhou
    Zhao, Ze-Yu
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
  • [8] Towards Lifelong Learning of End-to-end ASR
    Chang, Heng-Jui
    Lee, Hung-yi
    Lee, Lin-shan
    INTERSPEECH 2021, 2021, : 2551 - 2555
  • [9] UNSUPERVISED MODEL ADAPTATION FOR END-TO-END ASR
    Sivaraman, Ganesh
    Casal, Ricardo
    Garland, Matt
    Khoury, Elie
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6987 - 6991
  • [10] Phonemic competition in end-to-end ASR models
    ten Bosch, Louis
    Bentum, Martijn
    Boves, Lou
    INTERSPEECH 2023, 2023, : 586 - 590