Few-shot learning for E2E speech recognition: architectural variants for support set generation

被引：0

作者：

Eledath, Dhanya ^{[1
]}

Thurlapati, Narasimha Rao ^{[2
]}

Pavithra, V ^{[2
]}

Banerjee, Tirthankar ^{[3
]}

Ramasubramanian, V ^{[3
]}

机构：

[1] Int Inst Informat Technol Bangalore IIITB, Bangalore, Karnataka, India

[2] Samsung R&D Inst Bangalore SRI B, Bangalore, Karnataka, India

[3] IIIT Bangalore, Bangalore, Karnataka, India

来源：

2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022) | 2022年

关键词：

Few-shot Learning; Matching Networks; Continuous Speech Recognition; Coupled and Uncoupled architectures; Support Set Generation;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we propose two architectural variants of our recent adaptation of a 'few shot-learning' (FSL) framework 'Matching Networks' (MN) to end-to-end (E2E) continuous speech recognition (CSR) in a formulation termed 'MN-CTC' which involves a CTC-loss based end-to-end episodic training of MN and an associated CTC-based decoding of continuous speech. An important component of the MN theory is the labelled support-set during training and inference. The architectural variants proposed and studied here for E2E CSR, namely, the 'Uncoupled MN-CTC' and the 'Coupled MN-CTC', address this problem of generating supervised support sets from continuous speech. While the 'Uncoupled MN-CTC' generates the support-sets 'outside' the MN-architecture, the 'Coupled MN-CTC' variant is a derivative framework which generates the support set 'within' the MN-architecture through a multitask formulation coupling the support-set generation loss and the main MN-CTC loss for jointly optimizing the support-sets and the embedding functions of MN. On TIMIT and Librispeech datasets, we establish the 'few-shot' effectiveness of the proposed variants with PER and LER performances and also demonstrate the cross-domain applicability of the MN-CTC formulation with a Librispeech trained 'Coupled MN-CTC' variant inferencing on TIMIT low resource target-corpus with a 8% (absolute) LER advantage over a single-domain (TIMIT only) scenario.

引用

页码：444 / 448

页数：5

共 27 条

[1] Learning Relative Feature Displacement for Few-Shot Open-Set Recognition
Deng, Shule
Yu, Jin-Gang
Wu, Zihao
Gao, Hongxia
Li, Yansheng
Yang, Yang
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 5763 - 5774
[2] Deep Neural Network Calibration for E2E Speech Recognition System
Lee, Mun-Hak
Chang, Joon-Hyuk
[J]. INTERSPEECH 2021, 2021, : 4064 - 4068
[3] Glocal Energy-based Learning for Few-Shot Open-Set Recognition
Wang, Haoyu
Pang, Guansong
Wang, Peng
Zhang, Lei
Wei, Wei
Zhang, Yanning
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7507 - 7516
[4] NC2E: boosting few-shot learning with novel class center estimation
Wu, Zheng
Shen, Changchun
Guo, Kehua
Luo, Entao
Wang, Liwei
[J]. NEURAL COMPUTING & APPLICATIONS, 2023, 35 (09): : 7049 - 7062
[5] Cross-Corpus Speech Emotion Recognition Based on Few-Shot Learning and Domain Adaptation
Ahn, Youngdo
Lee, Sung Joo
Shin, Jong Won
[J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1190 - 1194
[6] Dissecting User-Perceived Latency of On-Device E2E Speech Recognition
Yuan Shangguan
Prabhavalkar, Rohit
Hang Su
Mahadeokar, Jay
Shi, Yangyang
Zhou, Jiatong
Wu, Chunyang
Duc Le
Kalinli, Ozlem
Fuegen, Christian
Seltzer, Michael L.
[J]. INTERSPEECH 2021, 2021, : 4553 - 4557
[7] Label-Synchronous Neural Transducer for Adaptable Online E2E Speech Recognition
Deng, Keqi
Woodland, Philip C.
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3507 - 3516
[8] Improving Recognition of Out-of-vocabulary Words in E2E Code-switching ASR by Fusing Speech Generation Methods
Ye, Lingxuan
Cheng, Gaofeng
Yang, Runyan
Yang, Zehui
Tian, Sanli
Zhang, Pengyuan
Yan, Yonghong
[J]. INTERSPEECH 2022, 2022, : 3163 - 3167
[9] An Open-Set Recognition and Few-Shot Learning Dataset for Audio Event Classification in Domestic Environments
Naranjo-Alcazar, Javier
Perez-Castanos, Sergi
Zuccarello, Pedro
Torres, Ana M.
Lopez, Jose J.
Ferri, Francesc J.
Cobos, Maximo
[J]. PATTERN RECOGNITION LETTERS, 2022, 164 : 40 - 45
[10] Extreme Value Meta-Learning for Few-Shot Open-Set Recognition of Hyperspectral Images
Pal, Debabrata
Bose, Shirsha
Banerjee, Biplab
Jeppu, Yogananda
[J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61

← 1 2 3 →