Biased Self-supervised learning for ASR

被引:0
|
作者
Kreyssig, Florian L. [1 ]
Shi, Yangyang [2 ]
Guo, Jinxi [2 ]
Sari, Leda [2 ]
Mohamed, Abdelrahman [2 ]
Woodland, Philip C. [1 ]
机构
[1] Univ Cambridge, Dept Engn, Cambridge, England
[2] Meta AI, New York, NY USA
来源
关键词
speech recognition; self-supervised learning; semi-supervised; unsupervised;
D O I
10.21437/Interspeech.2023-2499
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Self-supervised learning via masked prediction pre-training (MPPT) has shown impressive performance on a range of speech-processing tasks. This paper proposes a method to bias self-supervised learning towards a specific task. The core idea is to slightly finetune the model that is used to obtain the target sequence. This leads to better performance and a substantial increase in training speed. Furthermore, this paper proposes a variant of MPPT that allows low-footprint streaming models to be trained effectively by computing the MPPT loss on masked and unmasked frames. These approaches are evaluated for automatic speech recognition on the Librispeech corpus, where 100 hours of data served as the labelled data and 860 hours as the unlabelled data. The biased training outperforms the unbiased training by 15.5% after 250k updates and 23.8% after 100k updates on test-other. For the streaming models, the pre-training approach yields a reduction in word error rate of 44.1%.
引用
收藏
页码:4948 / 4952
页数:5
相关论文
共 50 条
  • [1] ASBERT: ASR-SPECIFIC SELF-SUPERVISED LEARNING WITH SELF-TRAINING
    Kim, Hyung Yong
    Kim, Byeong-Yeol
    Yoo, Seung Woo
    Lim, Youshin
    Lim, Yunkyu
    Lee, Hanbin
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 9 - 14
  • [2] ASR ERROR CORRECTION WITH DUAL-CHANNEL SELF-SUPERVISED LEARNING
    Zhang, Fan
    Tu, Mei
    Liu, Song
    Yan, Jinyao
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7282 - 7286
  • [3] Improving Streaming Transformer Based ASR Under a Framework of Self-supervised Learning
    Cao, Songjun
    Kang, Yueteng
    Fu, Yanzhe
    Xu, Xiaoshuo
    Sun, Sining
    Zhang, Yike
    Ma, Long
    INTERSPEECH 2021, 2021, : 706 - 710
  • [4] EFFECTIVENESS OF SELF-SUPERVISED PRE-TRAINING FOR ASR
    Baevski, Alexei
    Mohamed, Abdelrahman
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7694 - 7698
  • [5] Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models
    Fan, Ruchao
    Shankar, Natarajan Balaji
    Alwani, Abeer
    INTERSPEECH 2024, 2024, : 5173 - 5177
  • [6] Gated Self-supervised Learning for Improving Supervised Learning
    Fuadi, Erland Hillman
    Ruslim, Aristo Renaldo
    Wardhana, Putu Wahyu Kusuma
    Yudistira, Novanto
    2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 611 - 615
  • [7] Self-Supervised Dialogue Learning
    Wu, Jiawei
    Wang, Xin
    Wang, William Yang
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3857 - 3867
  • [8] Self-supervised learning model
    Saga, Kazushie
    Sugasaka, Tamami
    Sekiguchi, Minoru
    Fujitsu Scientific and Technical Journal, 1993, 29 (03): : 209 - 216
  • [9] Longitudinal self-supervised learning
    Zhao, Qingyu
    Liu, Zixuan
    Adeli, Ehsan
    Pohl, Kilian M.
    MEDICAL IMAGE ANALYSIS, 2021, 71
  • [10] Credal Self-Supervised Learning
    Lienen, Julian
    Huellermeier, Eyke
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34