Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-Resource Speech Recognition

被引:24
|
作者
Yi, Cheng [1 ,2 ]
Zhou, Shiyu [1 ]
Xu, Bo [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100190, Peoples R China
关键词
Acoustics; Bit error rate; Linguistics; Task analysis; Training; Decoding; Data models; BERT; end-to-end modeling; low-resource ASR; pre-training; wav2vec; CTC; ASR;
D O I
10.1109/LSP.2021.3071668
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
End-to-end models have achieved impressive results on the task of automatic speech recognition (ASR). For low-resource ASR tasks, however, labeled data can hardly satisfy the demand of end-to-end models. Self-supervised acoustic pre-training has already shown its impressive ASR performance, while the transcription is still inadequate for language modeling in end-to-end models. In this work, we fuse a pre-trained acoustic encoder (wav2vec2.0) and a pre-trained linguistic encoder (BERT) into an end-to-end ASR model. The fused model only needs to learn the transfer from speech to language during fine-tuning on limited labeled data. The length of the two modalities is matched by a monotonic attention mechanism without additional parameters. Besides, a fully connected layer is introduced for the hidden mapping between modalities. We further propose a scheduled fine-tuning strategy to preserve and utilize the text context modeling ability of the pre-trained linguistic encoder. Experiments show our effective utilizing of pre-trained modules. Our model achieves better recognition performance on CALLHOME corpus (15 hours) than other end-to-end models.
引用
收藏
页码:788 / 792
页数:5
相关论文
共 50 条
  • [31] Low-resource automatic speech recognition and error analyses of oral cancer speech
    Halpern, Bence Mark
    Feng, Siyuan
    van Son, Rob
    van den Brekel, Michiel
    Scharenborg, Odette
    [J]. SPEECH COMMUNICATION, 2022, 141 : 14 - 27
  • [32] A hybrid acoustic model based on PDP coding for resolving articulation differences in low-resource speech recognition
    Zhu, Wenbo
    Jin, Hao
    Chen, Jianwen
    Luo, Lufeng
    Wang, Jinhai
    Lu, Qinghua
    Li, Aiyuan
    [J]. APPLIED ACOUSTICS, 2022, 192
  • [33] Improved low-resource Somali speech recognition by semi-supervised acoustic and language model training
    Biswas, Astik
    Menon, Raghav
    van der Westhuizen, Ewald
    Niesler, Thomas
    [J]. INTERSPEECH 2019, 2019, : 3008 - 3012
  • [34] Low-resource Taxonomy Enrichment with Pretrained Language Models
    Takeoka, Kunihiro
    Akimoto, Kosuke
    Oyamada, Masafumi
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 2747 - 2758
  • [35] Advanced Convolutional Neural Network-Based Hybrid Acoustic Models for Low-Resource Speech Recognition
    Fantaye, Tessfu Geteye
    Yu, Junqing
    Hailu, Tulu Tilahun
    [J]. COMPUTERS, 2020, 9 (02)
  • [36] Predicting positive transfer for improved low-resource speech recognition using acoustic pseudo-tokens
    San, Nay
    Paraskevopoulos, Georgios
    Arora, Aryaman
    He, Xiluo
    Kaur, Prabhjot
    Adams, Oliver
    Jurafsky, Dan
    [J]. arXiv,
  • [37] A Comparative Study of Pre-trained Encoders for Low-Resource Named Entity Recognition
    Chen, Yuxuan
    Mikkelsen, Jonas
    Binder, Arne
    Alt, Christoph
    Hennig, Leonhard
    [J]. PROCEEDINGS OF THE 7TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP, 2022, : 46 - 59
  • [38] Linguistic Foundations of Low-Resource Languages for Speech Synthesis on the Example of the Kazakh Language
    Bekmanova, Gulmira
    Yergesh, Banu
    Sharipbay, Altynbek
    Omarbekova, Assel
    Zakirova, Alma
    [J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2022 WORKSHOPS, PART III, 2022, 13379 : 3 - 14
  • [39] Cross-Lingual Language Modeling for Low-Resource Speech Recognition
    Xu, Ping
    Fung, Pascale
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (06): : 1134 - 1144
  • [40] Speech-to-speech Low-resource Translation
    Liu, Hsiao-Chuan
    Day, Min-Yuh
    Wang, Chih-Chien
    [J]. 2023 IEEE 24TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE, IRI, 2023, : 91 - 95