Few-shot learning for E2E speech recognition: architectural variants for support set generation

被引:0
|
作者
Eledath, Dhanya [1 ]
Thurlapati, Narasimha Rao [2 ]
Pavithra, V [2 ]
Banerjee, Tirthankar [3 ]
Ramasubramanian, V [3 ]
机构
[1] Int Inst Informat Technol Bangalore IIITB, Bangalore, Karnataka, India
[2] Samsung R&D Inst Bangalore SRI B, Bangalore, Karnataka, India
[3] IIIT Bangalore, Bangalore, Karnataka, India
关键词
Few-shot Learning; Matching Networks; Continuous Speech Recognition; Coupled and Uncoupled architectures; Support Set Generation;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose two architectural variants of our recent adaptation of a 'few shot-learning' (FSL) framework 'Matching Networks' (MN) to end-to-end (E2E) continuous speech recognition (CSR) in a formulation termed 'MN-CTC' which involves a CTC-loss based end-to-end episodic training of MN and an associated CTC-based decoding of continuous speech. An important component of the MN theory is the labelled support-set during training and inference. The architectural variants proposed and studied here for E2E CSR, namely, the 'Uncoupled MN-CTC' and the 'Coupled MN-CTC', address this problem of generating supervised support sets from continuous speech. While the 'Uncoupled MN-CTC' generates the support-sets 'outside' the MN-architecture, the 'Coupled MN-CTC' variant is a derivative framework which generates the support set 'within' the MN-architecture through a multitask formulation coupling the support-set generation loss and the main MN-CTC loss for jointly optimizing the support-sets and the embedding functions of MN. On TIMIT and Librispeech datasets, we establish the 'few-shot' effectiveness of the proposed variants with PER and LER performances and also demonstrate the cross-domain applicability of the MN-CTC formulation with a Librispeech trained 'Coupled MN-CTC' variant inferencing on TIMIT low resource target-corpus with a 8% (absolute) LER advantage over a single-domain (TIMIT only) scenario.
引用
收藏
页码:444 / 448
页数:5
相关论文
共 27 条
  • [1] Learning Relative Feature Displacement for Few-Shot Open-Set Recognition
    Deng, Shule
    Yu, Jin-Gang
    Wu, Zihao
    Gao, Hongxia
    Li, Yansheng
    Yang, Yang
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 5763 - 5774
  • [2] Deep Neural Network Calibration for E2E Speech Recognition System
    Lee, Mun-Hak
    Chang, Joon-Hyuk
    [J]. INTERSPEECH 2021, 2021, : 4064 - 4068
  • [3] Glocal Energy-based Learning for Few-Shot Open-Set Recognition
    Wang, Haoyu
    Pang, Guansong
    Wang, Peng
    Zhang, Lei
    Wei, Wei
    Zhang, Yanning
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7507 - 7516
  • [4] NC2E: boosting few-shot learning with novel class center estimation
    Wu, Zheng
    Shen, Changchun
    Guo, Kehua
    Luo, Entao
    Wang, Liwei
    [J]. NEURAL COMPUTING & APPLICATIONS, 2023, 35 (09): : 7049 - 7062
  • [5] Cross-Corpus Speech Emotion Recognition Based on Few-Shot Learning and Domain Adaptation
    Ahn, Youngdo
    Lee, Sung Joo
    Shin, Jong Won
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1190 - 1194
  • [6] Dissecting User-Perceived Latency of On-Device E2E Speech Recognition
    Yuan Shangguan
    Prabhavalkar, Rohit
    Hang Su
    Mahadeokar, Jay
    Shi, Yangyang
    Zhou, Jiatong
    Wu, Chunyang
    Duc Le
    Kalinli, Ozlem
    Fuegen, Christian
    Seltzer, Michael L.
    [J]. INTERSPEECH 2021, 2021, : 4553 - 4557
  • [7] Label-Synchronous Neural Transducer for Adaptable Online E2E Speech Recognition
    Deng, Keqi
    Woodland, Philip C.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3507 - 3516
  • [8] Improving Recognition of Out-of-vocabulary Words in E2E Code-switching ASR by Fusing Speech Generation Methods
    Ye, Lingxuan
    Cheng, Gaofeng
    Yang, Runyan
    Yang, Zehui
    Tian, Sanli
    Zhang, Pengyuan
    Yan, Yonghong
    [J]. INTERSPEECH 2022, 2022, : 3163 - 3167
  • [9] An Open-Set Recognition and Few-Shot Learning Dataset for Audio Event Classification in Domestic Environments
    Naranjo-Alcazar, Javier
    Perez-Castanos, Sergi
    Zuccarello, Pedro
    Torres, Ana M.
    Lopez, Jose J.
    Ferri, Francesc J.
    Cobos, Maximo
    [J]. PATTERN RECOGNITION LETTERS, 2022, 164 : 40 - 45
  • [10] Extreme Value Meta-Learning for Few-Shot Open-Set Recognition of Hyperspectral Images
    Pal, Debabrata
    Bose, Shirsha
    Banerjee, Biplab
    Jeppu, Yogananda
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61