LATTICEBART: LATTICE-TO-LATTICE PRE-TRAINING FOR SPEECH RECOGNITION

被引:1
|
作者
Dai, Lingfeng
Chen, Lu [1 ]
Zhou, Zhikai
Yu, Kai [1 ]
机构
[1] Shanghai Jiao Tong Univ, AI Inst, MoE Key Lab Artificial Intelligence, Dept Comp Sci & Engn,X LANCE Lab, Shanghai, Peoples R China
关键词
speech recognition; lattice-to-lattice; lattice-to-sequence; pre-trained language model;
D O I
10.1109/ICASSP43922.2022.9746796
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
To improve automatic speech recognition, increasing work has attempted to further fix the output of ASR systems with advanced sequence models. However, the output of ASR systems differs significantly from the input form of standard sequence models. To encompass richer information, the output of ASR systems is often a compact lattice structure containing multiple sentences. This mismatch in input form significantly limits sequence models' ability. On the one hand, the widely used pre-trained models cannot directly input lattice structures and are therefore difficult to use for this task. On the other hand, the sparsity of the supervised training data forces the model to have the ability to learn from limited data. To address these problems, we propose LatticeBART, a model that decodes the sequence from the lattice in an end-to-end fashion and can use the pretrained language models' prior. In addition, this paper proposes the lattice-to-lattice pre-training method, which can be used when annotated data is missing, using easily generated lattice with the ASR system for training. The experimental results show that our model can effectively improve the output quality of the ASR system.
引用
收藏
页码:6112 / 6116
页数:5
相关论文
共 50 条
  • [1] SELF-TRAINING AND PRE-TRAINING ARE COMPLEMENTARY FOR SPEECH RECOGNITION
    Xu, Qiantong
    Baevski, Alexei
    Likhomanenko, Tatiana
    Tomasello, Paden
    Conneau, Alexis
    Collobert, Ronan
    Synnaeve, Gabriel
    Auli, Michael
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3030 - 3034
  • [2] Unified Speech-Text Pre-training for Speech Translation and Recognition
    Tang, Yun
    Gong, Hongyu
    Dong, Ning
    Wang, Changhan
    Hsu, Wei-Ning
    Gu, Jiatao
    Baevski, Alexei
    Li, Xian
    Mohamed, Abdelrahman
    Auli, Michael
    Pino, Juan
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 1488 - 1499
  • [3] A Study of Speech Recognition for Kazakh Based on Unsupervised Pre-Training
    Meng, Weijing
    Yolwas, Nurmemet
    [J]. SENSORS, 2023, 23 (02)
  • [4] MULTI-MODAL PRE-TRAINING FOR AUTOMATED SPEECH RECOGNITION
    Chan, David M.
    Ghosh, Shalini
    Chakrabarty, Debmalya
    Hoffmeister, Bjorn
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 246 - 250
  • [5] SENTIMENT-AWARE AUTOMATIC SPEECH RECOGNITION PRE-TRAINING FOR ENHANCED SPEECH EMOTION RECOGNITION
    Ghriss, Ayoub
    Yang, Bo
    Rozgic, Viktor
    Shriberg, Elizabeth
    Wang, Chao
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7347 - 7351
  • [6] wav2vec: Unsupervised Pre-training for Speech Recognition
    Schneider, Steffen
    Baevski, Alexei
    Collobert, Ronan
    Auli, Michael
    [J]. INTERSPEECH 2019, 2019, : 3465 - 3469
  • [7] A Multilingual Framework Based on Pre-training Model for Speech Emotion Recognition
    Zhang, Zhaohang
    Zhang, Xiaohui
    Guo, Min
    Zhang, Wei-Qiang
    Li, Ke
    Huang, Yukai
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 750 - 755
  • [8] TWO-STAGE PRE-TRAINING FOR SEQUENCE TO SEQUENCE SPEECH RECOGNITION
    Fan, Zhiyun
    Zhou, Shiyu
    Xu, Bo
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [9] Speech Pre-training with Acoustic Piece
    Ren, Shuo
    Liu, Shujie
    Wu, Yu
    Zhou, Long
    Wei, Furu
    [J]. INTERSPEECH 2022, 2022, : 2648 - 2652
  • [10] Multi-task Pre-training for Lhasa-Tibetan Speech Recognition
    Liu, Yigang
    Zhao, Yue
    Xu, Xiaona
    Xu, Liang
    Zhang, Xubei
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT IX, 2023, 14262 : 78 - 90