Reducing Domain mismatch in Self-supervised speech pre-training

被引:0
|
作者
Baskar, Murali Karthick [1 ]
Rosenberg, Andrew [2 ]
Ramabhadran, Bhuvana [2 ]
Zhang, Yu [2 ]
机构
[1] Brno Univ Technol, Brno, Czech Republic
[2] Google Inc, Mountain View, CA USA
来源
关键词
Self-supervision; Wav2vec2; pretraining; Data selection; Domain mismatch; asr; speech recognition;
D O I
10.21437/Interspeech.2022-736
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Masked speech modeling (MSM) methods such as wav2vec2 or w2v-BERT learn representations over speech frames which are randomly masked within an utterance. While these methods improve performance of Automatic Speech Recognition (ASR) systems, they have one major limitation. They treat all unsupervised speech samples with equal weight, which hinders learning as not all samples have relevant information to learn meaningful representations. In this work, we address this limitation. We propose ask2mask (ATM), a novel approach to focus on specific samples during MSM pre-training. ATM employs an external ASR model or scorer to weight unsupervised input samples by performing a fine-grained data selection. ATM performs masking over the highly confident input frames as chosen by the scorer. This allows the model to learn meaningful representations. We conduct fine-tuning experiments on two well-benchmarked corpora: LibriSpeech (matching the pre-training data) and, AMI and CHiME-6 (not matching the pre-training data). The results substantiate the efficacy of ATM on significantly improving the recognition performance under mismatched conditions while still yielding modest improvements under matched conditions.
引用
收藏
页码:3028 / 3032
页数:5
相关论文
共 50 条
  • [41] SslTransT: Self-supervised pre-training visual object tracking with Transformers
    Cai, Yannan
    Tan, Ke
    Wei, Zhenzhong
    [J]. OPTICS COMMUNICATIONS, 2024, 557
  • [42] Class incremental learning with self-supervised pre-training and prototype learning
    Liu, Wenzhuo
    Wu, Xin-Jian
    Zhu, Fei
    Yu, Ming-Ming
    Wang, Chuang
    Liu, Cheng-Lin
    [J]. Pattern Recognition, 2025, 157
  • [43] Self-supervised Heterogeneous Graph Pre-training Based on Structural Clustering
    Yang, Yaming
    Guan, Ziyu
    Wang, Zhe
    Zhao, Wei
    Xu, Cai
    Lu, Weigang
    Huang, Jianbin
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [44] Self-supervised Pre-training and Semi-supervised Learning for Extractive Dialog Summarization
    Zhuang, Yingying
    Song, Jiecheng
    Sadagopan, Narayanan
    Beniwal, Anurag
    [J]. COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, : 1069 - 1076
  • [45] LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading
    Qu, Leyuan
    Weber, Cornelius
    Wermter, Stefan
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) : 2772 - 2782
  • [46] Self-supervised Pre-training with Acoustic Configurations for Replay Spoofing Detection
    Shim, Hye-jin
    Heo, Hee-Soo
    Jung, Jee-weon
    Yu, Ha-Jin
    [J]. INTERSPEECH 2020, 2020, : 1091 - 1095
  • [47] MULTI-TASK SELF-SUPERVISED PRE-TRAINING FOR MUSIC CLASSIFICATION
    Wu, Ho-Hsiang
    Kao, Chieh-Chi
    Tang, Qingming
    Sun, Ming
    McFee, Brian
    Bello, Juan Pablo
    Wang, Chao
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 556 - 560
  • [48] PreTraM: Self-supervised Pre-training via Connecting Trajectory and Map
    Xu, Chenfeng
    Li, Tian
    Tang, Chen
    Sun, Lingfeng
    Keutzer, Kurt
    Tomizuka, Masayoshi
    Fathi, Alireza
    Zhan, Wei
    [J]. COMPUTER VISION, ECCV 2022, PT XXXIX, 2022, 13699 : 34 - 50
  • [49] Contrast to Divide: Self-Supervised Pre-Training for Learning with Noisy Labels
    Zheltonozhskii, Evgenii
    Baskin, Chaim
    Mendelson, Avi
    Bronstein, Alex M.
    Litany, Or
    [J]. 2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 387 - 397
  • [50] A NOISE-ROBUST SELF-SUPERVISED PRE-TRAINING MODEL BASED SPEECH REPRESENTATION LEARNING FOR AUTOMATIC SPEECH RECOGNITION
    Zhu, Qiu-Shi
    Zhang, Jie
    Zhang, Zi-Qiang
    Wu, Ming-Hui
    Fang, Xin
    Dai, Li-Rong
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3174 - 3178