UNSUPERVISED PRE-TRAINING OF BIDIRECTIONAL SPEECH ENCODERS VIA MASKED RECONSTRUCTION

被引:0
|
作者
Wang, Weiran [1 ]
Tang, Qingming [2 ]
Livescu, Karen [2 ]
机构
[1] Amazon Alexa, San Francisco, CA 94110 USA
[2] Toyota Technol Inst Chicago, Chicago, IL USA
关键词
Unsupervised representation learning; Pre-training; Masked reconstruction;
D O I
10.1109/icassp40776.2020.9053541
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose an approach for pre-training speech representations via a masked reconstruction loss. Our pre-trained encoder networks are bidirectional and can therefore be used directly in typical bidirectional speech recognition models. The pre-trained networks can then be fine-tuned on a smaller amount of supervised data for speech recognition. Experiments with this approach on the LibriSpeech and Wall Street Journal corpora show promising results. We find that the main factors that lead to speech recognition improvements are: masking segments of sufficient width in both time and frequency, pre-training on a much larger amount of unlabeled data than the labeled data, and domain adaptation when the unlabeled and labeled data come from different domains. The gain from pre-training is additive to that of supervised data augmentation.
引用
收藏
页码:6889 / 6893
页数:5
相关论文
共 50 条
  • [1] Neural speech enhancement with unsupervised pre-training and mixture training
    Hao, Xiang
    Xu, Chenglin
    Xie, Lei
    [J]. NEURAL NETWORKS, 2023, 158 : 216 - 227
  • [2] A Study of Speech Recognition for Kazakh Based on Unsupervised Pre-Training
    Meng, Weijing
    Yolwas, Nurmemet
    [J]. SENSORS, 2023, 23 (02)
  • [3] Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training
    Wang, Chengyi
    Wang, Yiming
    Wu, Yu
    Chen, Sanyuan
    Li, Jinyu
    Liu, Shujie
    Wei, Furu
    [J]. INTERSPEECH 2022, 2022, : 2643 - 2647
  • [4] On Masked Pre-training and the Marginal Likelihood
    Moreno-Munoz, Pablo
    Recasens, Pol G.
    Hauberg, Soren
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] wav2vec: Unsupervised Pre-training for Speech Recognition
    Schneider, Steffen
    Baevski, Alexei
    Collobert, Ronan
    Auli, Michael
    [J]. INTERSPEECH 2019, 2019, : 3465 - 3469
  • [6] Unsupervised Point Cloud Pre-training via Occlusion Completion
    Wang, Hanchen
    Liu, Qi
    Yue, Xiangyu
    Lasenby, Joan
    Kusner, Matt J.
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9762 - 9772
  • [7] UNSUPERVISED POINT CLOUD PRE-TRAINING VIA CONTRASTING AND CLUSTERING
    Mei, Guofeng
    Huang, Xiaoshui
    Liu, Juan
    Zhang, Jian
    Wu, Qiang
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 66 - 70
  • [8] Unsupervised Pre-Training for Detection Transformers
    Dai, Zhigang
    Cai, Bolun
    Lin, Yugeng
    Chen, Junying
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 12772 - 12782
  • [9] Unsupervised Pre-Training for Voice Activation
    Kolesau, Aliaksei
    Sesok, Dmitrij
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (23): : 1 - 13
  • [10] Unleashing the Transferability Power of Unsupervised Pre-Training for Emotion Recognition in Masked and Unmasked Facial Images
    D'Inca, Moreno
    Beyan, Cigdem
    Niewiadomski, Radoslaw
    Barattin, Simone
    Sebe, Nicu
    [J]. IEEE ACCESS, 2023, 11 : 90876 - 90890