Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-Resource Speech Recognition

被引：28

作者：

Yi, Cheng ^{[1
,2
]}

Zhou, Shiyu ^{[1
]}

Xu, Bo ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China

[2] Univ Chinese Acad Sci, Beijing 100190, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2021年 / 28卷 / 28期

关键词：

Acoustics; Bit error rate; Linguistics; Task analysis; Training; Decoding; Data models; BERT; end-to-end modeling; low-resource ASR; pre-training; wav2vec; CTC; ASR;

D O I：

10.1109/LSP.2021.3071668

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

End-to-end models have achieved impressive results on the task of automatic speech recognition (ASR). For low-resource ASR tasks, however, labeled data can hardly satisfy the demand of end-to-end models. Self-supervised acoustic pre-training has already shown its impressive ASR performance, while the transcription is still inadequate for language modeling in end-to-end models. In this work, we fuse a pre-trained acoustic encoder (wav2vec2.0) and a pre-trained linguistic encoder (BERT) into an end-to-end ASR model. The fused model only needs to learn the transfer from speech to language during fine-tuning on limited labeled data. The length of the two modalities is matched by a monotonic attention mechanism without additional parameters. Besides, a fully connected layer is introduced for the hidden mapping between modalities. We further propose a scheduled fine-tuning strategy to preserve and utilize the text context modeling ability of the pre-trained linguistic encoder. Experiments show our effective utilizing of pre-trained modules. Our model achieves better recognition performance on CALLHOME corpus (15 hours) than other end-to-end models.

引用

页码：788 / 792

页数：5

共 50 条

[21] Low-resource Sinhala Speech Recognition using Deep Learning
Karunathilaka, Hirunika
Welgama, Viraj
Nadungodage, Thilini
Weerasinghe, Ruvan
2020 20TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER-2020), 2020, : 196 - 201
[22] MIXSPEECH: DATA AUGMENTATION FOR LOW-RESOURCE AUTOMATIC SPEECH RECOGNITION
Meng, Linghui
Xu, Jin
Tan, Xu
Wang, Jindong
Qin, Tao
Xu, Bo
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7008 - 7012
[23] Language fusion via adapters for low-resource speech recognition
Hu, Qing
Zhang, Yan
Zhang, Xianlei
Han, Zongyu
Liang, Xiuxia
SPEECH COMMUNICATION, 2024, 158
[24] Weighted Gradient Pretrain for Low-Resource Speech Emotion Recognition
Xie, Yue
Liang, Ruiyu
Zhao, Xiaoyan
Liang, Zhenlin
Du, Jing
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2022, E105D (07) : 1352 - 1355
[25] Meta adversarial learning improves low-resource speech recognition
Chen, Yaqi
Yang, Xukui
Zhang, Hao
Zhang, Wenlin
Qu, Dan
Chen, Cong
COMPUTER SPEECH AND LANGUAGE, 2024, 84
[26] STOCHASTIC POOLING MAXOUT NETWORKS FOR LOW-RESOURCE SPEECH RECOGNITION
Cai, Meng
Shi, Yongzhe
Liu, Jia
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[27] MixRep: Hidden Representation Mixup for Low-Resource Speech Recognition
Xie, Jiamin
Hansen, John H. L.
INTERSPEECH 2023, 2023, : 1304 - 1308
[28] Adversarial Meta Sampling for Multilingual Low-Resource Speech Recognition
Xiao, Yubei
Gong, Ke
Zhou, Pan
Zheng, Guolin
Liang, Xiaodan
Lin, Liang
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14112 - 14120
[29] EXPLORING EFFECTIVE DATA UTILIZATION FOR LOW-RESOURCE SPEECH RECOGNITION
Zhou, Zhikai
Wang, Wei
Zhang, Wangyou
Qian, Yanmin
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8192 - 8196
[30] META-LEARNING FOR LOW-RESOURCE SPEECH EMOTION RECOGNITION
Chopra, Suransh
Mathur, Puneet
Sawhney, Ramit
Shah, Rajiv Ratn
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6259 - 6263

← 1 2 3 4 5 →