NAM plus : TOWARDS SCALABLE END-TO-END CONTEXTUAL BIASING FOR ADAPTIVE ASR

被引:2
|
作者
Munkhdalai, Tsendsuren [1 ]
Wu, Zelin [1 ]
Pundak, Golan [1 ]
Sim, Khe Chai [1 ]
Li, Jiayang [1 ]
Rondon, Pat [1 ]
Sainath, Tara N. [1 ]
机构
[1] Google LLC, Mountain View, CA 94043 USA
关键词
speech recognition; on-device learning; fast contextual adaptation; SPEECH RECOGNITION;
D O I
10.1109/SLT54892.2023.10023323
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Attention-based biasing techniques for end-to-end ASR systems are able to achieve large accuracy gains without requiring the inference algorithm adjustments and parameter tuning common to fusion approaches. However, it is challenging to simultaneously scale up attention-based biasing to realistic numbers of biased phrases; maintain in-domain WER gains, while minimizing out-of-domain losses; and run in real time. We present NAM+, an attention-based biasing approach which achieves a 16X inference speedup per acoustic frame over prior work when run with 3,000 biasing entities, as measured on a typical mobile CPU. NAM+ achieves these run-time gains through a combination of Two-Pass Hierarchical Attention and Dilated Context Update. Compared to the adapted baseline, NAM+ further decreases the in-domain WER by up to 12.6% relative, while incurring an out-of-domain WER regression of 20% relative. Compared to the non-adapted baseline, the out-of-domain WER regression is 7.1% relative.
引用
收藏
页码:190 / 196
页数:7
相关论文
共 50 条
  • [1] Contextual Biasing for End-to-End Chinese ASR
    Zhang, Kai
    Zhang, Qiuxia
    Wang, Chung-Che
    Jang, Jyh-Shing Roger
    [J]. IEEE ACCESS, 2024, 12 : 92960 - 92975
  • [2] Class LM and Word Mapping for Contextual Biasing in End-to-End ASR
    Huang, Rongqing
    Abdel-hamid, Ossama
    Li, Xinwei
    Evermann, Gunnar
    [J]. INTERSPEECH 2020, 2020, : 4348 - 4351
  • [3] Shallow-Fusion End-to-End Contextual Biasing
    Zhao, Ding
    Sainath, Tara N.
    Rybach, David
    Rondon, Pat
    Bhatia, Deepti
    Li, Bo
    Pang, Ruoming
    [J]. INTERSPEECH 2019, 2019, : 1418 - 1422
  • [4] Towards Lifelong Learning of End-to-end ASR
    Chang, Heng-Jui
    Lee, Hung-yi
    Lee, Lin-shan
    [J]. INTERSPEECH 2021, 2021, : 2551 - 2555
  • [5] Joint Grapheme and Phoneme Embeddings for Contextual End-to-End ASR
    Chen, Zhehuai
    Jain, Mahaveer
    Wang, Yongqiang
    Seltzer, Michael L.
    Fuegen, Christian
    [J]. INTERSPEECH 2019, 2019, : 3490 - 3494
  • [6] TOWARDS FAST AND ACCURATE STREAMING END-TO-END ASR
    Li, Bo
    Chang, Shuo-yiin
    Sainath, Tara N.
    Pang, Ruoming
    He, Yanzhang
    Strohman, Trevor
    Wu, Yonghui
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6069 - 6073
  • [7] End-to-End ASR with Adaptive Span Self-Attention
    Chang, Xuankai
    Subramanian, Aswin Shanmugam
    Guo, Pengcheng
    Watanabe, Shinji
    Fujita, Yuya
    Omachi, Motoi
    [J]. INTERSPEECH 2020, 2020, : 3595 - 3599
  • [8] Timestamp-aligning and keyword-biasing end-to-end ASR front-end for a KWS system
    Gui-Xin Shi
    Wei-Qiang Zhang
    Guan-Bo Wang
    Jing Zhao
    Shu-Zhou Chai
    Ze-Yu Zhao
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2021
  • [9] DOES SPEECH ENHANCEMENTWORK WITH END-TO-END ASR OBJECTIVES?: EXPERIMENTAL ANALYSIS OF MULTICHANNEL END-TO-END ASR
    Ochiai, Tsubasa
    Watanabe, Shinji
    Katagiri, Shigeru
    [J]. 2017 IEEE 27TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2017,
  • [10] Timestamp-aligning and keyword-biasing end-to-end ASR front-end for a KWS system
    Shi, Gui-Xin
    Zhang, Wei-Qiang
    Wang, Guan-Bo
    Zhao, Jing
    Chai, Shu-Zhou
    Zhao, Ze-Yu
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)