IMPROVING PSEUDO-LABEL TRAINING FOR END-TO-END SPEECH RECOGNITION USING GRADIENT MASK

被引:5
|
作者
Ling, Shaoshi [1 ]
Shen, Chen [1 ]
Cai, Meng [1 ]
Ma, Zejun [1 ]
机构
[1] Bytedance AI Lab, Shanghai, Peoples R China
关键词
speech recognition; semi-supervised learning; pseudo-labeling; end-to-end model;
D O I
10.1109/ICASSP43922.2022.9746249
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In the recent trend of semi-supervised speech recognition, both self-supervised representation learning and pseudo-labeling have shown promising results. In this paper, we propose a novel approach to combine their ideas for end-to-end speech recognition model. Without any extra loss function, we utilize the Gradient Mask to optimize the model when training on pseudo-label. This method forces the speech recognition model to predict from the masked input to learn strong acoustic representation and make training robust to label noise. In our semi-supervised experiments, the method can improve the model's performance when training on pseudo-label and our method achieved competitive results comparing with other semi-supervised approaches on the Librispeech 100 hours experiments.
引用
收藏
页码:8397 / 8401
页数:5
相关论文
共 50 条
  • [41] End-to-End Spontaneous Speech Recognition Using Disfluency Labeling
    Horii, Koharu
    Fukuda, Meiko
    Ohta, Kengo
    Nishimura, Ryota
    Ogawa, Atsunori
    Kitaoka, Norihide
    INTERSPEECH 2022, 2022, : 4108 - 4112
  • [42] CONSISTENT TRAINING AND DECODING FOR END-TO-END SPEECH RECOGNITION USING LATTICE-FREE MMI
    Tian, Jinchuan
    Yu, Jianwei
    Weng, Chao
    Zhang, Shi-Xiong
    Su, Dan
    Yu, Dong
    Zou, Yuexian
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7782 - 7786
  • [43] Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation
    Wang, Changhan
    Pino, Juan
    Gu, Jiatao
    INTERSPEECH 2020, 2020, : 4731 - 4735
  • [44] Improved Training for End-to-End Streaming Automatic Speech Recognition Model with Punctuation
    Kim, Hanbyul
    Seo, Seunghyun
    Lee, Lukas
    Baek, Seolki
    INTERSPEECH 2023, 2023, : 1653 - 1657
  • [45] COMBINING END-TO-END AND ADVERSARIAL TRAINING FOR LOW-RESOURCE SPEECH RECOGNITION
    Drexler, Jennifer
    Glass, James
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 361 - 368
  • [46] Minimum Latency Training of Sequence Transducers for Streaming End-to-End Speech Recognition
    Shinohara, Yusuke
    Watanabe, Shinji
    INTERSPEECH 2022, 2022, : 2098 - 2102
  • [47] Improved training strategies for end-to-end speech recognition in digital voice assistants
    Tulsiani, Hitesh
    Sapru, Ashtosh
    Arsikere, Harish
    Punjabi, Surabhi
    Garimella, Sri
    INTERSPEECH 2020, 2020, : 2792 - 2796
  • [48] TOKEN-WISE TRAINING FOR ATTENTION BASED END-TO-END SPEECH RECOGNITION
    Wang, Peidong
    Cui, Jia
    Weng, Chao
    Yu, Dong
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6276 - 6280
  • [49] SYNCHRONOUS TRANSFORMERS FOR END-TO-END SPEECH RECOGNITION
    Tian, Zhengkun
    Yi, Jiangyan
    Bai, Ye
    Tao, Jianhua
    Zhang, Shuai
    Wen, Zhengqi
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7884 - 7888
  • [50] End-to-End Speech Recognition of Tamil Language
    Changrampadi, Mohamed Hashim
    Shahina, A.
    Narayanan, M. Badri
    Khan, A. Nayeemulla
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 32 (02): : 1309 - 1323