IMPROVING PSEUDO-LABEL TRAINING FOR END-TO-END SPEECH RECOGNITION USING GRADIENT MASK

被引:5
|
作者
Ling, Shaoshi [1 ]
Shen, Chen [1 ]
Cai, Meng [1 ]
Ma, Zejun [1 ]
机构
[1] Bytedance AI Lab, Shanghai, Peoples R China
关键词
speech recognition; semi-supervised learning; pseudo-labeling; end-to-end model;
D O I
10.1109/ICASSP43922.2022.9746249
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In the recent trend of semi-supervised speech recognition, both self-supervised representation learning and pseudo-labeling have shown promising results. In this paper, we propose a novel approach to combine their ideas for end-to-end speech recognition model. Without any extra loss function, we utilize the Gradient Mask to optimize the model when training on pseudo-label. This method forces the speech recognition model to predict from the masked input to learn strong acoustic representation and make training robust to label noise. In our semi-supervised experiments, the method can improve the model's performance when training on pseudo-label and our method achieved competitive results comparing with other semi-supervised approaches on the Librispeech 100 hours experiments.
引用
收藏
页码:8397 / 8401
页数:5
相关论文
共 50 条
  • [31] End-to-end multilingual speech recognition system with language supervision training
    Liu, Danyang
    Xu, Ji
    Zhang, Pengyuan
    IEICE Transactions on Information and Systems, 2020, E103D (06) : 1427 - 1430
  • [32] EXPLORING MODEL UNITS AND TRAINING STRATEGIES FOR END-TO-END SPEECH RECOGNITION
    Huang, Mingkun
    Lu, Yizhou
    Wang, Lan
    Qian, Yanmin
    Yu, Kai
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 524 - 531
  • [33] Towards end-to-end training of automatic speech recognition for nigerian pidgin
    Ajisafe, Daniel
    Adegboro, Oluwabukola
    Oduntan, Esther
    Arulogun, Tayo
    arXiv, 2020,
  • [34] Large Margin Training for Attention Based End-to-End Speech Recognition
    Wang, Peidong
    Cui, Jia
    Weng, Chao
    Yu, Dong
    INTERSPEECH 2019, 2019, : 246 - 250
  • [35] END-TO-END WHISPERED SPEECH RECOGNITION WITH FREQUENCY-WEIGHTED APPROACHES AND PSEUDO WHISPER PRE-TRAINING
    Chang, Heng-Jui
    Liu, Alexander H.
    Lee, Hung-yi
    Lee, Lin-shan
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 186 - 193
  • [36] END-TO-END DYSARTHRIC SPEECH RECOGNITION USING MULTIPLE DATABASES
    Takashima, Yuki
    Takiguchi, Tetsuya
    Ariki, Yasuo
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6395 - 6399
  • [37] Arabic speech recognition using end-to-end deep learning
    Alsayadi, Hamzah A.
    Abdelhamid, Abdelaziz A.
    Hegazy, Islam
    Fayed, Zaki T.
    IET SIGNAL PROCESSING, 2021, 15 (08) : 521 - 534
  • [38] End-to-End Label Uncertainty Modeling in Speech Emotion Recognition Using Bayesian Neural Networks and Label Distribution Learning
    Prabhu, Navin Raj
    Lehmann-Willenbrock, Nale
    Gerkmann, Timo
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (02) : 579 - 592
  • [39] Low Latency Speech Recognition using End-to-End Prefetching
    Chang, Shuo-Yiin
    Li, Bo
    Rybach, David
    He, Yanzhang
    Li, Wei
    Sainath, Tara
    Strohman, Trevor
    INTERSPEECH 2020, 2020, : 1962 - 1966
  • [40] End-to-End Spontaneous Speech Recognition Using Hesitation Labeling
    Horii, Koharu
    Fukuda, Meiko
    Ohta, Kengo
    Nishimura, Ryota
    Ogawa, Atsunori
    Kitaoka, Norihide
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1077 - 1081