Reducing Domain mismatch in Self-supervised speech pre-training

被引:0
|
作者
Baskar, Murali Karthick [1 ]
Rosenberg, Andrew [2 ]
Ramabhadran, Bhuvana [2 ]
Zhang, Yu [2 ]
机构
[1] Brno Univ Technol, Brno, Czech Republic
[2] Google Inc, Mountain View, CA USA
来源
关键词
Self-supervision; Wav2vec2; pretraining; Data selection; Domain mismatch; asr; speech recognition;
D O I
10.21437/Interspeech.2022-736
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Masked speech modeling (MSM) methods such as wav2vec2 or w2v-BERT learn representations over speech frames which are randomly masked within an utterance. While these methods improve performance of Automatic Speech Recognition (ASR) systems, they have one major limitation. They treat all unsupervised speech samples with equal weight, which hinders learning as not all samples have relevant information to learn meaningful representations. In this work, we address this limitation. We propose ask2mask (ATM), a novel approach to focus on specific samples during MSM pre-training. ATM employs an external ASR model or scorer to weight unsupervised input samples by performing a fine-grained data selection. ATM performs masking over the highly confident input frames as chosen by the scorer. This allows the model to learn meaningful representations. We conduct fine-tuning experiments on two well-benchmarked corpora: LibriSpeech (matching the pre-training data) and, AMI and CHiME-6 (not matching the pre-training data). The results substantiate the efficacy of ATM on significantly improving the recognition performance under mismatched conditions while still yielding modest improvements under matched conditions.
引用
收藏
页码:3028 / 3032
页数:5
相关论文
共 50 条
  • [31] Self-Supervised Pre-training on the Target Domain for Cross-Domain Person Re-identification
    Zhang, Junyin
    Ge, Yongxin
    Gu, Xinqian
    Hua, Boyu
    Xiang, Tao
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4268 - 4276
  • [32] Text-Guided HuBERT: Self-Supervised Speech Pre-Training via Generative Adversarial Networks
    Ma, Duo
    Yue, Xianghu
    Ao, Junyi
    Gao, Xiaoxue
    Li, Haizhou
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2055 - 2059
  • [33] Token Boosting for Robust Self-Supervised Visual Transformer Pre-training
    Li, Tianjiao
    Foo, Lin Geng
    Hu, Ping
    Shang, Xindi
    Rahmani, Hossein
    Yuan, Zehuan
    Liu, Jun
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 24027 - 24038
  • [34] Joint Encoder-Decoder Self-Supervised Pre-training for ASR
    Arunkumar, A.
    Umesh, S.
    [J]. INTERSPEECH 2022, 2022, : 3418 - 3422
  • [35] SslTransT: Self-supervised pre-training visual object tracking with Transformers
    Cai, Yannan
    Tan, Ke
    Wei, Zhenzhong
    [J]. OPTICS COMMUNICATIONS, 2024, 557
  • [36] Self-supervised Heterogeneous Graph Pre-training Based on Structural Clustering
    Yang, Yaming
    Guan, Ziyu
    Wang, Zhe
    Zhao, Wei
    Xu, Cai
    Lu, Weigang
    Huang, Jianbin
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [37] Class incremental learning with self-supervised pre-training and prototype learning
    Liu, Wenzhuo
    Wu, Xin-Jian
    Zhu, Fei
    Yu, Ming-Ming
    Wang, Chuang
    Liu, Cheng-Lin
    [J]. Pattern Recognition, 2025, 157
  • [38] DialogueBERT: A Self-Supervised Learning based Dialogue Pre-training Encoder
    Zhang, Zhenyu
    Guo, Tao
    Chen, Meng
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 3647 - 3651
  • [39] Individualized Stress Mobile Sensing Using Self-Supervised Pre-Training
    Islam, Tanvir
    Washington, Peter
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (21):
  • [40] Self-Supervised Pre-training for Protein Embeddings Using Tertiary Structures
    Guo, Yuzhi
    Wu, Jiaxiang
    Ma, Hehuan
    Huang, Junzhou
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6801 - 6809