AN ADAPTER BASED PRE-TRAINING FOR EFFICIENT AND SCALABLE SELF-SUPERVISED SPEECH REPRESENTATION LEARNING

被引:8
|
作者
Kessler, Samuel [2 ]
Thomas, Bethan [1 ]
Karout, Salah [1 ]
机构
[1] Huawei R&D UK, Cambridge, MA USA
[2] Univ Oxford, Oxford, England
关键词
Self-Supervision; Transfer Learning; Continual Learning; Automatic Speech Recognition;
D O I
10.1109/ICASSP43922.2022.9747374
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present a method for transferring pre-trained self-supervised (SSL) speech representations to multiple languages. There is an abundance of unannotated speech, so creating self-supervised representations from raw audio and fine-tuning on small annotated datasets is a promising direction to build speech recognition systems. SSL models generally perform SSL on raw audio in a pre-training phase and then fine-tune on a small fraction of annotated data. Such models have produced state of the art results for ASR. However, these models are very expensive to pre-train. We use an existing wav2vec 2.0 model and tackle the problem of learning new language representations while utilizing existing model knowledge. Crucially we do so without catastrophic forgetting of the existing language representation. We use adapter modules to speed up pre-training a new language task. Our model can decrease pre-training times by 32% when learning a new language task, and learn this new audio-language representation without forgetting previous language representation. We evaluate by applying these language representations to automatic speech recognition.
引用
收藏
页码:3179 / 3183
页数:5
相关论文
共 50 条
  • [1] A NOISE-ROBUST SELF-SUPERVISED PRE-TRAINING MODEL BASED SPEECH REPRESENTATION LEARNING FOR AUTOMATIC SPEECH RECOGNITION
    Zhu, Qiu-Shi
    Zhang, Jie
    Zhang, Zi-Qiang
    Wu, Ming-Hui
    Fang, Xin
    Dai, Li-Rong
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3174 - 3178
  • [2] Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training
    Zhang, Bowen
    Cao, Songjun
    Zhang, Xiaoming
    Zhang, Yike
    Ma, Long
    Shinozaki, Takahiro
    [J]. INTERSPEECH 2022, 2022, : 2653 - 2657
  • [3] Reducing Domain mismatch in Self-supervised speech pre-training
    Baskar, Murali Karthick
    Rosenberg, Andrew
    Ramabhadran, Bhuvana
    Zhang, Yu
    [J]. INTERSPEECH 2022, 2022, : 3028 - 3032
  • [4] Representation Recovering for Self-Supervised Pre-training on Medical Images
    Yan, Xiangyi
    Naushad, Junayed
    Sun, Shanlin
    Han, Kun
    Tang, Hao
    Kong, Deying
    Ma, Haoyu
    You, Chenyu
    Xie, Xiaohui
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2684 - 2694
  • [5] DialogueBERT: A Self-Supervised Learning based Dialogue Pre-training Encoder
    Zhang, Zhenyu
    Guo, Tao
    Chen, Meng
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 3647 - 3651
  • [6] Self-supervised ECG pre-training
    Liu, Han
    Zhao, Zhenbo
    She, Qiang
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2021, 70
  • [7] Dense Contrastive Learning for Self-Supervised Visual Pre-Training
    Wang, Xinlong
    Zhang, Rufeng
    Shen, Chunhua
    Kong, Tao
    Li, Lei
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3023 - 3032
  • [8] Stabilizing Label Assignment for Speech Separation by Self-supervised Pre-training
    Huang, Sung-Feng
    Chuang, Shun-Po
    Liu, Da-Rong
    Chen, Yi-Chen
    Yang, Gene-Ping
    Lee, Hung-yi
    [J]. INTERSPEECH 2021, 2021, : 3056 - 3060
  • [9] GUIDED CONTRASTIVE SELF-SUPERVISED PRE-TRAINING FOR AUTOMATIC SPEECH RECOGNITION
    Khare, Aparna
    Wu, Minhua
    Bhati, Saurabhchand
    Droppo, Jasha
    Maas, Roland
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 174 - 181
  • [10] COMPARISON OF SELF-SUPERVISED SPEECH PRE-TRAINING METHODS ON FLEMISH DUTCH
    Poncelet, Jakob
    Hamme, Hugo Van
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 169 - 176