Reducing Domain mismatch in Self-supervised speech pre-training

被引：0

作者：

Baskar, Murali Karthick ^{[1
]}

Rosenberg, Andrew ^{[2
]}

Ramabhadran, Bhuvana ^{[2
]}

Zhang, Yu ^{[2
]}

机构：

[1] Brno Univ Technol, Brno, Czech Republic

[2] Google Inc, Mountain View, CA USA

来源：

INTERSPEECH 2022 | 2022年

关键词：

Self-supervision; Wav2vec2; pretraining; Data selection; Domain mismatch; asr; speech recognition;

D O I：

10.21437/Interspeech.2022-736

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Masked speech modeling (MSM) methods such as wav2vec2 or w2v-BERT learn representations over speech frames which are randomly masked within an utterance. While these methods improve performance of Automatic Speech Recognition (ASR) systems, they have one major limitation. They treat all unsupervised speech samples with equal weight, which hinders learning as not all samples have relevant information to learn meaningful representations. In this work, we address this limitation. We propose ask2mask (ATM), a novel approach to focus on specific samples during MSM pre-training. ATM employs an external ASR model or scorer to weight unsupervised input samples by performing a fine-grained data selection. ATM performs masking over the highly confident input frames as chosen by the scorer. This allows the model to learn meaningful representations. We conduct fine-tuning experiments on two well-benchmarked corpora: LibriSpeech (matching the pre-training data) and, AMI and CHiME-6 (not matching the pre-training data). The results substantiate the efficacy of ATM on significantly improving the recognition performance under mismatched conditions while still yielding modest improvements under matched conditions.

引用

页码：3028 / 3032

页数：5

共 50 条

[41] SslTransT: Self-supervised pre-training visual object tracking with Transformers
Cai, Yannan
Tan, Ke
Wei, Zhenzhong
[J]. OPTICS COMMUNICATIONS, 2024, 557
[42] Class incremental learning with self-supervised pre-training and prototype learning
Liu, Wenzhuo
Wu, Xin-Jian
Zhu, Fei
Yu, Ming-Ming
Wang, Chuang
Liu, Cheng-Lin
[J]. Pattern Recognition, 2025, 157
[43] Self-supervised Heterogeneous Graph Pre-training Based on Structural Clustering
Yang, Yaming
Guan, Ziyu
Wang, Zhe
Zhao, Wei
Xu, Cai
Lu, Weigang
Huang, Jianbin
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[44] Self-supervised Pre-training and Semi-supervised Learning for Extractive Dialog Summarization
Zhuang, Yingying
Song, Jiecheng
Sadagopan, Narayanan
Beniwal, Anurag
[J]. COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, : 1069 - 1076
[45] LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading
Qu, Leyuan
Weber, Cornelius
Wermter, Stefan
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) : 2772 - 2782
[46] Self-supervised Pre-training with Acoustic Configurations for Replay Spoofing Detection
Shim, Hye-jin
Heo, Hee-Soo
Jung, Jee-weon
Yu, Ha-Jin
[J]. INTERSPEECH 2020, 2020, : 1091 - 1095
[47] MULTI-TASK SELF-SUPERVISED PRE-TRAINING FOR MUSIC CLASSIFICATION
Wu, Ho-Hsiang
Kao, Chieh-Chi
Tang, Qingming
Sun, Ming
McFee, Brian
Bello, Juan Pablo
Wang, Chao
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 556 - 560
[48] PreTraM: Self-supervised Pre-training via Connecting Trajectory and Map
Xu, Chenfeng
Li, Tian
Tang, Chen
Sun, Lingfeng
Keutzer, Kurt
Tomizuka, Masayoshi
Fathi, Alireza
Zhan, Wei
[J]. COMPUTER VISION, ECCV 2022, PT XXXIX, 2022, 13699 : 34 - 50
[49] Contrast to Divide: Self-Supervised Pre-Training for Learning with Noisy Labels
Zheltonozhskii, Evgenii
Baskin, Chaim
Mendelson, Avi
Bronstein, Alex M.
Litany, Or
[J]. 2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 387 - 397
[50] A NOISE-ROBUST SELF-SUPERVISED PRE-TRAINING MODEL BASED SPEECH REPRESENTATION LEARNING FOR AUTOMATIC SPEECH RECOGNITION
Zhu, Qiu-Shi
Zhang, Jie
Zhang, Zi-Qiang
Wu, Ming-Hui
Fang, Xin
Dai, Li-Rong
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3174 - 3178

← 1 2 3 4 5 →