Reducing Domain mismatch in Self-supervised speech pre-training

被引:0
|
作者
Baskar, Murali Karthick [1 ]
Rosenberg, Andrew [2 ]
Ramabhadran, Bhuvana [2 ]
Zhang, Yu [2 ]
机构
[1] Brno Univ Technol, Brno, Czech Republic
[2] Google Inc, Mountain View, CA USA
来源
关键词
Self-supervision; Wav2vec2; pretraining; Data selection; Domain mismatch; asr; speech recognition;
D O I
10.21437/Interspeech.2022-736
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Masked speech modeling (MSM) methods such as wav2vec2 or w2v-BERT learn representations over speech frames which are randomly masked within an utterance. While these methods improve performance of Automatic Speech Recognition (ASR) systems, they have one major limitation. They treat all unsupervised speech samples with equal weight, which hinders learning as not all samples have relevant information to learn meaningful representations. In this work, we address this limitation. We propose ask2mask (ATM), a novel approach to focus on specific samples during MSM pre-training. ATM employs an external ASR model or scorer to weight unsupervised input samples by performing a fine-grained data selection. ATM performs masking over the highly confident input frames as chosen by the scorer. This allows the model to learn meaningful representations. We conduct fine-tuning experiments on two well-benchmarked corpora: LibriSpeech (matching the pre-training data) and, AMI and CHiME-6 (not matching the pre-training data). The results substantiate the efficacy of ATM on significantly improving the recognition performance under mismatched conditions while still yielding modest improvements under matched conditions.
引用
收藏
页码:3028 / 3032
页数:5
相关论文
共 50 条
  • [1] CDS: Cross-Domain Self-supervised Pre-training
    Kim, Donghyun
    Saito, Kuniaki
    Oh, Tae-Hyun
    Plummer, Bryan A.
    Sclaroff, Stan
    Saenko, Kate
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9103 - 9112
  • [2] MEASURING THE IMPACT OF DOMAIN FACTORS IN SELF-SUPERVISED PRE-TRAINING
    Sanabria, Ramon
    Wei-Ning, Hsu
    Alexei, Baevski
    Auli, Michael
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,
  • [3] Self-supervised ECG pre-training
    Liu, Han
    Zhao, Zhenbo
    She, Qiang
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2021, 70
  • [4] Stabilizing Label Assignment for Speech Separation by Self-supervised Pre-training
    Huang, Sung-Feng
    Chuang, Shun-Po
    Liu, Da-Rong
    Chen, Yi-Chen
    Yang, Gene-Ping
    Lee, Hung-yi
    [J]. INTERSPEECH 2021, 2021, : 3056 - 3060
  • [5] GUIDED CONTRASTIVE SELF-SUPERVISED PRE-TRAINING FOR AUTOMATIC SPEECH RECOGNITION
    Khare, Aparna
    Wu, Minhua
    Bhati, Saurabhchand
    Droppo, Jasha
    Maas, Roland
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 174 - 181
  • [6] COMPARISON OF SELF-SUPERVISED SPEECH PRE-TRAINING METHODS ON FLEMISH DUTCH
    Poncelet, Jakob
    Hamme, Hugo Van
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 169 - 176
  • [7] Self-Supervised Underwater Image Generation for Underwater Domain Pre-Training
    Wu, Zhiheng
    Wu, Zhengxing
    Chen, Xingyu
    Lu, Yue
    Yu, Junzhi
    [J]. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 14
  • [8] Self-supervised Pre-training for Mirror Detection
    Lin, Jiaying
    Lau, Rynson W. H.
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 12193 - 12202
  • [9] Self-supervised Pre-training for Nuclei Segmentation
    Haq, Mohammad Minhazul
    Huang, Junzhou
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT II, 2022, 13432 : 303 - 313
  • [10] EFFECTIVENESS OF SELF-SUPERVISED PRE-TRAINING FOR ASR
    Baevski, Alexei
    Mohamed, Abdelrahman
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7694 - 7698