Cross-Lingual Self-training to Learn Multilingual Representation for Low-Resource Speech Recognition

被引:0
|
作者
Zi-Qiang Zhang
Yan Song
Ming-Hui Wu
Xin Fang
Ian McLoughlin
Li-Rong Dai
机构
[1] University of Science and Technology of China (USTC),ICT Cluster
[2] Singapore Institute of Technology,undefined
关键词
Multilingual representation learning; Cross-lingual self-training; Low-resource speech recognition; Speech pre-training;
D O I
暂无
中图分类号
学科分类号
摘要
Representation learning or pre-training has shown promising performance for low-resource speech recognition which suffers from the data shortage. Recently, self-supervised methods have achieved surprising performance for speech pre-training by effectively utilizing large amount of un-annotated data. In this paper, we propose a new pre-training framework, Cross-Lingual Self-Training (XLST), to further improve the effectiveness for multilingual representation learning. Specifically, XLST first trains a phoneme classification model with a small amount of annotated data of a non-target language and then uses it to produce initial targets for training another model on multilingual un-annotated data, i.e., maximizing frame-level similarity between the output embeddings of two models. Furthermore, we employ the moving average and multi-view data augmentation mechanisms to better generalize the learned representations. Experimental results on downstream speech recognition tasks for 5 low-resource languages demonstrate the effectiveness of XLST. Specifically, leveraging additional 100 h of annotated English data for pre-training, the proposed XLST achieves a relative 24.8% PER reduction over the state-of-the-art self-supervised methods.
引用
收藏
页码:6827 / 6843
页数:16
相关论文
共 50 条
  • [1] Cross-Lingual Self-training to Learn Multilingual Representation for Low-Resource Speech Recognition
    Zhang, Zi-Qiang
    Song, Yan
    Wu, Ming-Hui
    Fang, Xin
    McLoughlin, Ian
    Dai, Li-Rong
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (12) : 6827 - 6843
  • [2] A Comparative Study of BNF and DNN Multilingual Training on Cross-lingual Low-resource Speech Recognition
    Xu, Haihua
    Van Hai Do
    Xiao, Xiong
    Chng, Eng-Siong
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2132 - 2136
  • [3] Cross-Lingual Summarization Method Based on Joint Training and Self-Training in Low-Resource Scenarios
    Cheng, Shaohuan
    Tang, Yujia
    Liu, Qiao
    Chen, Wenyu
    [J]. Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2024, 53 (05): : 762 - 770
  • [4] Cross-Lingual Language Modeling for Low-Resource Speech Recognition
    Xu, Ping
    Fung, Pascale
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (06): : 1134 - 1144
  • [5] Exploiting Adapters for Cross-Lingual Low-Resource Speech Recognition
    Hou, Wenxin
    Zhu, Han
    Wang, Yidong
    Wang, Jindong
    Qin, Tao
    Xu, Renju
    Shinozaki, Takahiro
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 317 - 329
  • [6] Cross-Lingual and Ensemble MLPs Strategies for Low-Resource Speech Recognition
    Qian, Yanmin
    Liu, Jia
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2581 - 2584
  • [7] LEARNING CROSS-LINGUAL INFORMATION WITH MULTILINGUAL BLSTM FOR SPEECH SYNTHESIS OF LOW-RESOURCE LANGUAGES
    Yu, Quanjie
    Liu, Peng
    Wu, Zhiyong
    Kang, Shiyin
    Meng, Helen
    Cai, Lianhong
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5545 - 5549
  • [8] Cross-Lingual Subspace Gaussian Mixture Models for Low-Resource Speech Recognition
    Lu, Liang
    Ghoshal, Arnab
    Renals, Steve
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (01) : 17 - 27
  • [9] Cross-lingual subspace Gaussian mixture models for low-resource speech recognition
    [J]. 1600, Institute of Electrical and Electronics Engineers Inc., United States (22):
  • [10] SUBSPACE MIXTURE MODEL FOR LOW-RESOURCE SPEECH RECOGNITION IN CROSS-LINGUAL SETTINGS
    Miao, Yajie
    Metze, Florian
    Waibel, Alex
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7339 - 7343