Improving Transformer-based Speech Recognition with Unsupervised Pre-training and Multi-task Semantic Knowledge Learning

被引:5
|
作者
Li, Song [1 ]
Li, Lin [1 ]
Hong, Qingyang [2 ]
Liu, Lingling [1 ]
机构
[1] Xiamen Univ, Sch Elect Sci & Engn, Xiamen, Fujian, Peoples R China
[2] Xiamen Univ, Sch Informat, Xiamen, Fujian, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
unsupervised pre-training; speech recognition; Transformer; multi-task learning; semi-supervised learning;
D O I
10.21437/Interspeech.2020-2007
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Recently, the Transformer-based end-to-end speech recognition system has become a state-of-the-art technology. However, one prominent problem with current end-to-end speech recognition systems is that an extensive amount of paired data are required to achieve better recognition performance. In order to grapple with such an issue, we propose two unsupervised pre-training strategies for the encoder and the decoder of Transformer respectively, which make full use of unpaired data for training. In addition, we propose a new semi-supervised fine-tuning method named multi-task semantic knowledge learning to strengthen the Transformer's ability to learn about semantic knowledge, thereby improving the system performance. We achieve the best CER with our proposed methods on AISHELL-1 test set: 5.9%, which exceeds the best end-to-end model by 10.6% relative CER. Moreover, relative CER reduction of 20.3% and 17.8% are obtained for low-resource Mandarin and English data sets, respectively.
引用
收藏
页码:5006 / 5010
页数:5
相关论文
共 50 条
  • [1] Transformer-based transfer learning and multi-task learning for improving the performance of speech emotion recognition
    Park, Sunchan
    Kim, Hyung Soon
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2021, 40 (05): : 515 - 522
  • [2] Multi-task Pre-training for Lhasa-Tibetan Speech Recognition
    Liu, Yigang
    Zhao, Yue
    Xu, Xiaona
    Xu, Liang
    Zhang, Xubei
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT IX, 2023, 14262 : 78 - 90
  • [3] Multi-task Active Learning for Pre-trained Transformer-based Models
    Rotman, Guy
    Reichart, Roi
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2022, 10 : 1209 - 1228
  • [4] Multi-task Pre-training Language Model for Semantic Network Completion
    Li, Da
    Zhu, Boqing
    Yang, Sen
    Xu, Kele
    Yi, Ming
    He, Yukai
    Wang, Huaimin
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (11)
  • [5] A Study of Speech Recognition for Kazakh Based on Unsupervised Pre-Training
    Meng, Weijing
    Yolwas, Nurmemet
    [J]. SENSORS, 2023, 23 (02)
  • [6] Multi-task Pre-training with Soft Biometrics for Transfer-learning Palmprint Recognition
    Huanhuan Xu
    Lu Leng
    Ziyuan Yang
    Andrew Beng Jin Teoh
    Zhe Jin
    [J]. Neural Processing Letters, 2023, 55 : 2341 - 2358
  • [7] Multi-task Pre-training with Soft Biometrics for Transfer-learning Palmprint Recognition
    Xu, Huanhuan
    Leng, Lu
    Yang, Ziyuan
    Teoh, Andrew Beng Jin
    Jin, Zhe
    [J]. NEURAL PROCESSING LETTERS, 2023, 55 (03) : 2341 - 2358
  • [8] TRANSFORMER BASED UNSUPERVISED PRE-TRAINING FOR ACOUSTIC REPRESENTATION LEARNING
    Zhang, Ruixiong
    Wu, Haiwei
    Li, Wubo
    Jiang, Dongwei
    Zou, Wei
    Li, Xiangang
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6933 - 6937
  • [9] Improving Short Answer Grading Using Transformer-Based Pre-training
    Sung, Chul
    Dhamecha, Tejas Indulal
    Mukhi, Nirmal
    [J]. ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2019), PT I, 2019, 11625 : 469 - 481
  • [10] Improving AMR-to-text Generation with Multi-task Pre-training
    Xu, Dong-Qin
    Li, Jun-Hui
    Zhu, Mu-Hua
    Zhou, Guo-Dong
    [J]. Ruan Jian Xue Bao/Journal of Software, 2021, 32 (10): : 3036 - 3050