GUIDED CONTRASTIVE SELF-SUPERVISED PRE-TRAINING FOR AUTOMATIC SPEECH RECOGNITION

被引:0
|
作者
Khare, Aparna [1 ]
Wu, Minhua [1 ]
Bhati, Saurabhchand [1 ,2 ]
Droppo, Jasha [1 ]
Maas, Roland [1 ]
机构
[1] Amazon Alexa, Seattle, WA 98109 USA
[2] Johns Hopkins Univ, Baltimore, MD 21218 USA
关键词
Self-supervised learning; RNN-T; ASR;
D O I
10.1109/SLT54892.2023.10022676
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Contrastive Predictive Coding (CPC) is a representation learning method that maximizes the mutual information between intermediate latent representations and the output of a given model. It can be used to effectively initialize the encoder of an Automatic Speech Recognition (ASR) model. We present a novel modification of CPC called Guided Contrastive Predictive Coding (GCPC). Our proposed method maximizes the mutual information between representations from a prior-knowledge model and the output of the model being pre-trained, allowing prior knowledge injection during pre-training. We validate our method on 3 ASR tasks: German, French and English. Our method outperforms CPC pre-training on all three datasets, reducing the Word Error Rate (WER) by 4.44%, 6.55% and 15.43% relative on the German, French and English (Librispeech) tasks respectively, compared to training from scratch, while CPC pre-training only brings 2.96%, 1.01% and 14.39% relative WER reduction respectively.
引用
收藏
页码:174 / 181
页数:8
相关论文
共 50 条
  • [11] Text-Guided HuBERT: Self-Supervised Speech Pre-Training via Generative Adversarial Networks
    Ma, Duo
    Yue, Xianghu
    Ao, Junyi
    Gao, Xiaoxue
    Li, Haizhou
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2055 - 2059
  • [12] Self-supervised Pre-training for Mirror Detection
    Lin, Jiaying
    Lau, Rynson W. H.
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 12193 - 12202
  • [13] Self-supervised Pre-training for Nuclei Segmentation
    Haq, Mohammad Minhazul
    Huang, Junzhou
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT II, 2022, 13432 : 303 - 313
  • [14] EFFECTIVENESS OF SELF-SUPERVISED PRE-TRAINING FOR ASR
    Baevski, Alexei
    Mohamed, Abdelrahman
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7694 - 7698
  • [15] LPCL: Localized prominence contrastive learning for self-supervised dense visual pre-training
    Chen, Zihan
    Zhu, Hongyuan
    Cheng, Hao
    Mi, Siya
    Zhang, Yu
    Geng, Xin
    [J]. PATTERN RECOGNITION, 2023, 135
  • [16] Investigating Self-supervised Pre-training for End-to-end Speech Translation
    Ha Nguyen
    Bougares, Fethi
    Tomashenko, Natalia
    Esteve, Yannick
    Besacier, Laurent
    [J]. INTERSPEECH 2020, 2020, : 1466 - 1470
  • [17] Self-Supervised Contrastive Pre-Training for Time Series via Time-Frequency Consistency
    Zhang, Xiang
    Zhao, Ziyuan
    Tsiligkaridis, Theodoros
    Zitnik, Marinka
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [18] Contrastive Learning for Self-Supervised Pre-Training of Point Cloud Segmentation Networks With Image Data
    Janda, Andrej
    Wagstaff, Brandon
    Ng, Edwin G.
    Kelly, Jonathan
    [J]. 2023 20TH CONFERENCE ON ROBOTS AND VISION, CRV, 2023, : 145 - 152
  • [19] Self-supervised Pre-training and Contrastive Representation Learning for Multiple-choice Video QA
    Kim, Seonhoon
    Jeong, Seohyeong
    Kim, Eunbyul
    Kang, Inho
    Kwak, Nojun
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 13171 - 13179
  • [20] Self-Supervised Pre-training for Time Series Classification
    Shi, Pengxiang
    Ye, Wenwen
    Qin, Zheng
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,