Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-Resource Speech Recognition

被引:24
|
作者
Yi, Cheng [1 ,2 ]
Zhou, Shiyu [1 ]
Xu, Bo [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100190, Peoples R China
关键词
Acoustics; Bit error rate; Linguistics; Task analysis; Training; Decoding; Data models; BERT; end-to-end modeling; low-resource ASR; pre-training; wav2vec; CTC; ASR;
D O I
10.1109/LSP.2021.3071668
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
End-to-end models have achieved impressive results on the task of automatic speech recognition (ASR). For low-resource ASR tasks, however, labeled data can hardly satisfy the demand of end-to-end models. Self-supervised acoustic pre-training has already shown its impressive ASR performance, while the transcription is still inadequate for language modeling in end-to-end models. In this work, we fuse a pre-trained acoustic encoder (wav2vec2.0) and a pre-trained linguistic encoder (BERT) into an end-to-end ASR model. The fused model only needs to learn the transfer from speech to language during fine-tuning on limited labeled data. The length of the two modalities is matched by a monotonic attention mechanism without additional parameters. Besides, a fully connected layer is introduced for the hidden mapping between modalities. We further propose a scheduled fine-tuning strategy to preserve and utilize the text context modeling ability of the pre-trained linguistic encoder. Experiments show our effective utilizing of pre-trained modules. Our model achieves better recognition performance on CALLHOME corpus (15 hours) than other end-to-end models.
引用
收藏
页码:788 / 792
页数:5
相关论文
共 50 条
  • [1] Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition
    Zheng, Guolin
    Xiao, Yubei
    Gong, Ke
    Zhou, Pan
    Liang, Xiaodan
    Lin, Liang
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 2765 - 2777
  • [2] Multilingual acoustic models for speech recognition in low-resource devices
    Garcia, Enrique Gil
    Mengusoglu, Erhan
    Janke, Eric
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 981 - +
  • [3] Acoustic Modeling for Hindi Speech Recognition in Low-Resource Settings
    Dey, Anik
    Zhang, Weibin
    Fung, Pascale
    [J]. 2014 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), VOLS 1-2, 2014, : 891 - 894
  • [4] Acoustic Modeling Based on Deep Learning for Low-Resource Speech Recognition: An Overview
    Yu, Chongchong
    Kang, Meng
    Chen, Yunbing
    Wu, Jiajia
    Zhao, Xia
    [J]. IEEE ACCESS, 2020, 8 : 163829 - 163843
  • [5] CURRICULUM OPTIMIZATION FOR LOW-RESOURCE SPEECH RECOGNITION
    Kuznetsova, Anastasia
    Kumar, Anurag
    Fox, Jennifer Drexler
    Tyers, Francis M.
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8187 - 8191
  • [6] Enrollment in low-resource speech recognition systems
    Deligne, S
    Dharanipragada, S
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 341 - 344
  • [7] Acoustic model training using self-attention for low-resource speech recognition
    Park, Hosung
    Kim, Ji-Hwan
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2020, 39 (05): : 483 - 489
  • [8] DEEP MAXOUT NETWORKS FOR LOW-RESOURCE SPEECH RECOGNITION
    Miao, Yajie
    Metze, Florian
    Rawat, Shourabh
    [J]. 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 398 - 403
  • [9] ADVERSARIAL MULTILINGUAL TRAINING FOR LOW-RESOURCE SPEECH RECOGNITION
    Yi, Jiangyan
    Tao, Jianhua
    Wen, Zhengqi
    Bai, Ye
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4899 - 4903
  • [10] LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
    Xu, Jin
    Tan, Xu
    Ren, Yi
    Qin, Tao
    Li, Jian
    Zhao, Sheng
    Liu, Tie-Yan
    [J]. KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 2802 - 2812