META-LEARNING FOR IMPROVING RARE WORD RECOGNITION IN END-TO-END ASR

被引:3
|
作者
Lux, Florian [1 ]
Ngoc Thang Vu [1 ]
机构
[1] Univ Stuttgart, Inst Nat Language Proc, D-70569 Stuttgart, Germany
关键词
meta learning; keyword spotting; speech recognition; speech embedding;
D O I
10.1109/ICASSP39728.2021.9414298
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this work we take on the challenge of rare word recognition in end-to-end (E2E) automatic speech recognition (ASR) by integrating a meta learning mechanism into an E2E ASR system, enabling few-shot adaptation. We propose a novel method of generating embeddings for speech, changes to four meta learning approaches, enabling them to perform keyword spotting and an approach to using their outcomes in an E2E ASR system. We verify the functionality of each of our three contributions in two experiments exploring their performance for different amounts of classes (N-way) and examples per class (k-shot) in a few-shot setting. We find that the information encoded in the speech embeddings suffices to allow the modified meta learning approaches to perform continuous signal spotting. Despite the simplicity of the interface between keyword spotting and speech recognition, we are able to consistently improve word error rate by up to 5%.
引用
收藏
页码:5974 / 5978
页数:5
相关论文
共 50 条
  • [1] LEARNING WORD-LEVEL CONFIDENCE FOR SUBWORD END-TO-END ASR
    Qiu, David
    Li, Qiujia
    He, Yanzhang
    Zhang, Yu
    Li, Bo
    Cao, Liangliang
    Prabhavalkar, Rohit
    Bhatia, Deepti
    Li, Wei
    Hu, Ke
    Sainath, Tara N.
    McGraw, Ian
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6393 - 6397
  • [2] End-to-End Speech Emotion Recognition Combined with Acoustic-to-Word ASR Model
    Feng, Han
    Ueno, Sei
    Kawahara, Tatsuya
    [J]. INTERSPEECH 2020, 2020, : 501 - 505
  • [3] IMPROVING END-TO-END SPEECH RECOGNITION WITH POLICY LEARNING
    Zhou, Yingbo
    Xiong, Caiming
    Socher, Richard
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5819 - 5823
  • [4] META-LEARNING TO COMMUNICATE: FAST END-TO-END TRAINING FOR FADING CHANNELS
    Park, Sangwoo
    Simeone, Osvaldo
    Kang, Joonhyuk
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 5075 - 5079
  • [5] Towards Lifelong Learning of End-to-end ASR
    Chang, Heng-Jui
    Lee, Hung-yi
    Lee, Lin-shan
    [J]. INTERSPEECH 2021, 2021, : 2551 - 2555
  • [6] Improving Performance of End-to-End ASR on Numeric Sequences
    Peyser, Cal
    Zhang, Hao
    Sainath, Tara N.
    Wu, Zelin
    [J]. INTERSPEECH 2019, 2019, : 2185 - 2189
  • [7] IMPROVING PROPER NOUN RECOGNITION IN END-TO-END ASR BY CUSTOMIZATION OF THE MWER LOSS CRITERION
    Peyser, Cal
    Sainath, Tara N.
    Pundak, Golan
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7789 - 7793
  • [8] Improved End-to-End Dysarthric Speech Recognition via Meta-learning Based Model Re-initialization
    Wang, Disong
    Yu, Jianwei
    Wu, Xixin
    Sun, Lifa
    Liu, Xunying
    Meng, Helen
    [J]. 2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [9] Speech Representation Learning for Emotion Recognition Using End-to-End ASR with Factorized Adaptation
    Yeh, Sung-Lin
    Lin, Yun-Shao
    Lee, Chi-Chun
    [J]. INTERSPEECH 2020, 2020, : 536 - 540
  • [10] Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction
    Qiu, David
    He, Yanzhang
    Li, Qiujia
    Zhang, Yu
    Gao, Liangliang
    McGraw, Ian
    [J]. INTERSPEECH 2021, 2021, : 4074 - 4078