A NOISE-ROBUST SELF-SUPERVISED PRE-TRAINING MODEL BASED SPEECH REPRESENTATION LEARNING FOR AUTOMATIC SPEECH RECOGNITION

被引:12
|
作者
Zhu, Qiu-Shi [1 ]
Zhang, Jie [1 ,2 ]
Zhang, Zi-Qiang [1 ]
Wu, Ming-Hui [1 ]
Fang, Xin [1 ]
Dai, Li-Rong [1 ]
机构
[1] Univ Sci & Technol China USTC, NEL SLIP, Hefei, Peoples R China
[2] Chinese Acad Sci, Inst Acoust, State Key Lab Acoust, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Wav2vec2.0; speech recognition; noise robustness; self-supervised pre-training; speech representation;
D O I
10.1109/ICASSP43922.2022.9747379
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Wav2vec2.0 is a popular self-supervised pre-training framework for learning speech representations in the context of automatic speech recognition (ASR). It was shown that wav2vec2.0 has a good robustness against the domain shift, while the noise robustness is still unclear. In this work, we therefore first analyze the noise robustness of wav2vec2.0 via experiments. We observe that wav2vec2.0 pre-trained on noisy data can obtain good representations and thus improve the ASR performance on the noisy test set, which however brings a performance degradation on the clean test set. To avoid this issue, in this work we propose an enhanced wav2vec2.0 model. Specifically, the noisy speech and the corresponding clean version are fed into the same feature encoder, where the clean speech provides training targets for the model. Experimental results reveal that the proposed method can not only improve the ASR performance on the noisy test set which surpasses the original wav2vec2.0, but also ensure a tiny performance decrease on the clean test set. In addition, the effectiveness of the proposed method is demonstrated under different types of noise conditions.
引用
收藏
页码:3174 / 3178
页数:5
相关论文
共 50 条
  • [1] A Joint Speech Enhancement and Self-Supervised Representation Learning Framework for Noise-Robust Speech Recognition
    Zhu, Qiu-Shi
    Zhang, Jie
    Zhang, Zi-Qiang
    Dai, Li-Rong
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1927 - 1939
  • [2] GUIDED CONTRASTIVE SELF-SUPERVISED PRE-TRAINING FOR AUTOMATIC SPEECH RECOGNITION
    Khare, Aparna
    Wu, Minhua
    Bhati, Saurabhchand
    Droppo, Jasha
    Maas, Roland
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 174 - 181
  • [3] AN ADAPTER BASED PRE-TRAINING FOR EFFICIENT AND SCALABLE SELF-SUPERVISED SPEECH REPRESENTATION LEARNING
    Kessler, Samuel
    Thomas, Bethan
    Karout, Salah
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3179 - 3183
  • [4] Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training
    Zhang, Bowen
    Cao, Songjun
    Zhang, Xiaoming
    Zhang, Yike
    Ma, Long
    Shinozaki, Takahiro
    [J]. INTERSPEECH 2022, 2022, : 2653 - 2657
  • [5] Consistency self-supervised learning method for robust automatic speech recognition
    Gao, Changfeng
    Cheng, Gaofeng
    Zhang, Pengyuan
    [J]. Shengxue Xuebao/Acta Acustica, 2023, 48 (03): : 578 - 587
  • [6] Reducing Domain mismatch in Self-supervised speech pre-training
    Baskar, Murali Karthick
    Rosenberg, Andrew
    Ramabhadran, Bhuvana
    Zhang, Yu
    [J]. INTERSPEECH 2022, 2022, : 3028 - 3032
  • [7] Knowledge Distillation-Based Training of Speech Enhancement for Noise-Robust Automatic Speech Recognition
    Woo Lee, Geon
    Kook Kim, Hong
    Kong, Duk-Jo
    [J]. IEEE ACCESS, 2024, 12 : 72707 - 72720
  • [8] An Overview of Noise-Robust Automatic Speech Recognition
    Li, Jinyu
    Deng, Li
    Gong, Yifan
    Haeb-Umbach, Reinhold
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (04) : 745 - 777
  • [9] Stabilizing Label Assignment for Speech Separation by Self-supervised Pre-training
    Huang, Sung-Feng
    Chuang, Shun-Po
    Liu, Da-Rong
    Chen, Yi-Chen
    Yang, Gene-Ping
    Lee, Hung-yi
    [J]. INTERSPEECH 2021, 2021, : 3056 - 3060
  • [10] COMPARISON OF SELF-SUPERVISED SPEECH PRE-TRAINING METHODS ON FLEMISH DUTCH
    Poncelet, Jakob
    Hamme, Hugo Van
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 169 - 176