GUIDED CONTRASTIVE SELF-SUPERVISED PRE-TRAINING FOR AUTOMATIC SPEECH RECOGNITION

被引:0
|
作者
Khare, Aparna [1 ]
Wu, Minhua [1 ]
Bhati, Saurabhchand [1 ,2 ]
Droppo, Jasha [1 ]
Maas, Roland [1 ]
机构
[1] Amazon Alexa, Seattle, WA 98109 USA
[2] Johns Hopkins Univ, Baltimore, MD 21218 USA
关键词
Self-supervised learning; RNN-T; ASR;
D O I
10.1109/SLT54892.2023.10022676
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Contrastive Predictive Coding (CPC) is a representation learning method that maximizes the mutual information between intermediate latent representations and the output of a given model. It can be used to effectively initialize the encoder of an Automatic Speech Recognition (ASR) model. We present a novel modification of CPC called Guided Contrastive Predictive Coding (GCPC). Our proposed method maximizes the mutual information between representations from a prior-knowledge model and the output of the model being pre-trained, allowing prior knowledge injection during pre-training. We validate our method on 3 ASR tasks: German, French and English. Our method outperforms CPC pre-training on all three datasets, reducing the Word Error Rate (WER) by 4.44%, 6.55% and 15.43% relative on the German, French and English (Librispeech) tasks respectively, compared to training from scratch, while CPC pre-training only brings 2.96%, 1.01% and 14.39% relative WER reduction respectively.
引用
收藏
页码:174 / 181
页数:8
相关论文
共 50 条
  • [1] Dense Contrastive Learning for Self-Supervised Visual Pre-Training
    Wang, Xinlong
    Zhang, Rufeng
    Shen, Chunhua
    Kong, Tao
    Li, Lei
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3023 - 3032
  • [2] Contrastive Self-Supervised Pre-Training for Video Quality Assessment
    Chen, Pengfei
    Li, Leida
    Wu, Jinjian
    Dong, Weisheng
    Shi, Guangming
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 458 - 471
  • [3] A NOISE-ROBUST SELF-SUPERVISED PRE-TRAINING MODEL BASED SPEECH REPRESENTATION LEARNING FOR AUTOMATIC SPEECH RECOGNITION
    Zhu, Qiu-Shi
    Zhang, Jie
    Zhang, Zi-Qiang
    Wu, Ming-Hui
    Fang, Xin
    Dai, Li-Rong
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3174 - 3178
  • [4] Reducing Domain mismatch in Self-supervised speech pre-training
    Baskar, Murali Karthick
    Rosenberg, Andrew
    Ramabhadran, Bhuvana
    Zhang, Yu
    [J]. INTERSPEECH 2022, 2022, : 3028 - 3032
  • [5] Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training
    Zhang, Bowen
    Cao, Songjun
    Zhang, Xiaoming
    Zhang, Yike
    Ma, Long
    Shinozaki, Takahiro
    [J]. INTERSPEECH 2022, 2022, : 2653 - 2657
  • [6] Self-supervised ECG pre-training
    Liu, Han
    Zhao, Zhenbo
    She, Qiang
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2021, 70
  • [7] Self-supervised depth super-resolution with contrastive multiview pre-training
    Qiao, Xin
    Ge, Chenyang
    Zhao, Chaoqiang
    Tosi, Fabio
    Poggi, Matteo
    Mattoccia, Stefano
    [J]. NEURAL NETWORKS, 2023, 168 : 223 - 236
  • [8] Comparing Self-Supervised Pre-Training and Semi-Supervised Training for Speech Recognition in Languages with Weak Language Models
    Lam-Yee-Mui, Lea-Marie
    Yang, Lucas Ondel
    Klejch, Ondrej
    [J]. INTERSPEECH 2023, 2023, : 87 - 91
  • [9] Stabilizing Label Assignment for Speech Separation by Self-supervised Pre-training
    Huang, Sung-Feng
    Chuang, Shun-Po
    Liu, Da-Rong
    Chen, Yi-Chen
    Yang, Gene-Ping
    Lee, Hung-yi
    [J]. INTERSPEECH 2021, 2021, : 3056 - 3060
  • [10] COMPARISON OF SELF-SUPERVISED SPEECH PRE-TRAINING METHODS ON FLEMISH DUTCH
    Poncelet, Jakob
    Hamme, Hugo Van
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 169 - 176