Shallow-Fusion End-to-End Contextual Biasing

被引:54
|
作者
Zhao, Ding [1 ]
Sainath, Tara N. [1 ]
Rybach, David [1 ]
Rondon, Pat [1 ]
Bhatia, Deepti [1 ]
Li, Bo [1 ]
Pang, Ruoming [1 ]
机构
[1] Google Inc, Mountain View, CA 94043 USA
来源
关键词
D O I
10.21437/Interspeech.2019-1209
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Contextual biasing to a specific domain, including a user's song names, app names and contact names, is an important component of any production-level automatic speech recognition (ASR) system. Contextual biasing is particularly challenging in end-to-end models because these models keep a small list of candidates during beam search, and also do poorly on proper nouns, which is the main source of biasing phrases. In this paper, we present various algorithmic and training improvements to shallow-fusion-based biasing for end-to-end models. We will show that the proposed approach obtains better performance than a state-of-the-art conventional model across a variety of tasks, the first time this has been demonstrated.
引用
收藏
页码:1418 / 1422
页数:5
相关论文
共 50 条
  • [1] Contextual Biasing for End-to-End Chinese ASR
    Zhang, Kai
    Zhang, Qiuxia
    Wang, Chung-Che
    Jang, Jyh-Shing Roger
    [J]. IEEE ACCESS, 2024, 12 : 92960 - 92975
  • [2] Class LM and Word Mapping for Contextual Biasing in End-to-End ASR
    Huang, Rongqing
    Abdel-hamid, Ossama
    Li, Xinwei
    Evermann, Gunnar
    [J]. INTERSPEECH 2020, 2020, : 4348 - 4351
  • [3] Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion
    Duc Le
    Jain, Mahaveer
    Keren, Gil
    Kim, Suyoun
    Shi, Yangyang
    Mahadeokar, Jay
    Chan, Julian
    Shangguan, Yuan
    Fuegen, Christian
    Kalinli, Ozlem
    Saraf, Yatharth
    Seltzer, Michael L.
    [J]. INTERSPEECH 2021, 2021, : 1772 - 1776
  • [4] NAM plus : TOWARDS SCALABLE END-TO-END CONTEXTUAL BIASING FOR ADAPTIVE ASR
    Munkhdalai, Tsendsuren
    Wu, Zelin
    Pundak, Golan
    Sim, Khe Chai
    Li, Jiayang
    Rondon, Pat
    Sainath, Tara N.
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 190 - 196
  • [5] END-TO-END SHALLOW NETWORK FOR VARIATIONAL PANSHARPENING
    Tomas-Cruz, Marc
    Mifdal, Jamila
    Coll, Bartomeu
    Duran, Joan
    [J]. IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 6803 - 6806
  • [6] INVESTIGATIONS ON END-TO-END AUDIOVISUAL FUSION
    Wand, Michael
    Ngoc Thang Vu
    Schmidhuber, Juergen
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 3041 - 3045
  • [7] DEEP CONTEXT: END-TO-END CONTEXTUAL SPEECH RECOGNITION
    Pundak, Golan
    Sainath, Tara N.
    Prabhavalkar, Rohit
    Kannan, Anjuli
    Zhao, Ding
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 418 - 425
  • [8] End-to-end Contextual Perception and Prediction with Interaction Transformer
    Li, Lingyun Luke
    Yang, Bin
    Liang, Ming
    Zeng, Wenyuan
    Ren, Mengye
    Segal, Sean
    Urtasun, Raquel
    [J]. 2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 5784 - 5791
  • [9] End-to-End FusVAE for Face Image Fusion
    Li, Xiang
    Chen, Bo
    Wen, Meijin
    Wang, Haoshuang
    [J]. 2018 INTERNATIONAL CONFERENCE ON ALGORITHMS, COMPUTING AND ARTIFICIAL INTELLIGENCE (ACAI 2018), 2018,
  • [10] Joint Grapheme and Phoneme Embeddings for Contextual End-to-End ASR
    Chen, Zhehuai
    Jain, Mahaveer
    Wang, Yongqiang
    Seltzer, Michael L.
    Fuegen, Christian
    [J]. INTERSPEECH 2019, 2019, : 3490 - 3494