JOINT CTC-ATTENTION BASED END-TO-END SPEECH RECOGNITION USING MULTI-TASK LEARNING

被引:0
|
作者
Kim, Suyoun [1 ,2 ]
Hori, Takaaki [1 ]
Watanabe, Shinji [1 ]
机构
[1] Mitsubishi Elect Res Labs, Cambridge, MA 02139 USA
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
end-to-end; speech recognition; connectionist temporal classification; attention; multi-task learning;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently, there has been an increasing interest in end-to-end speech recognition that directly transcribes speech to text without any pre-defined alignments. One approach is the attention-based encoder-decoder framework that learns a mapping between variable-length input and output sequences in one step using a purely data-driven method. The attention model has often been shown to improve the performance over another end-to-end approach, the Connectionist Temporal Classification (CTC), mainly because it explicitly uses the history of the target character without any conditional independence assumptions. However, we observed that the performance of the attention has shown poor results in noisy condition and is hard to learn in the initial training stage with long input sequences. This is because the attention model is too flexible to predict proper alignments in such cases due to the lack of left-to-right constraints as used in CTC. This paper presents a novel method for end-to-end speech recognition to improve robustness and achieve fast convergence by using a joint CTC-attention model within the multi-task learning framework, thereby mitigating the alignment issue. An experiment on the WSJ and CHiME-4 tasks demonstrates its advantages over both the CTC and attention-based encoder-decoder baselines, showing 5.4-14.6% relative improvements in Character Error Rate (CER).
引用
收藏
页码:4835 / 4839
页数:5
相关论文
共 50 条
  • [41] END-TO-END SPEECH RECOGNITION USING A HIGH RANK LSTM-CTC BASED MODEL
    Shi, Yangyang
    Hwang, Mei-Yuh
    Lei, Xin
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7080 - 7084
  • [42] An End-to-end Speech Recognition Algorithm based on Attention Mechanism
    Chen, Jia-nan
    Gao, Shuang
    Sun, Han-zhe
    Liu, Xiao-hui
    Wang, Zi-ning
    Zheng, Yan
    [J]. PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 2935 - 2940
  • [43] TRIGGERED ATTENTION FOR END-TO-END SPEECH RECOGNITION
    Moritz, Niko
    Hori, Takaaki
    Le Roux, Jonathan
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5666 - 5670
  • [44] End-to-End Speech Recognition with Auditory Attention for Multi-Microphone Distance Speech Recognition
    Kim, Suyoun
    Lane, Ian
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3867 - 3871
  • [45] IMPROVING HYBRID CTC/ATTENTION END-TO-END SPEECH RECOGNITION WITH PRETRAINED ACOUSTIC AND LANGUAGE MODELS
    Deng, Keqi
    Cao, Songjun
    Zhang, Yike
    Ma, Long
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 76 - 82
  • [46] Towards end-to-end Cyberthreat Detection from Twitter using Multi-Task Learning
    Dionisio, Nuno
    Alves, Fernando
    Ferreira, Pedro M.
    Bessani, Alysson
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [47] STREAM ATTENTION-BASED MULTI-ARRAY END-TO-END SPEECH RECOGNITION
    Wang, Xiaofei
    Li, Ruizhi
    Mallidi, Sri Harish
    Hori, Takaaki
    Watanabe, Shinji
    Hermansky, Hynek
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7105 - 7109
  • [48] Noise-robust Attention Learning for End-to-End Speech Recognition
    Higuchi, Yosuke
    Tawara, Naohiro
    Ogawa, Atsunori
    Iwata, Tomoharu
    Kobayashi, Tetsunori
    Ogawa, Tetsuji
    [J]. 28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 311 - 315
  • [49] End-to-End Multi-Task Learning for Lung Nodule Segmentation and Diagnosis
    Chen, Wei
    Wang, Qiuli
    Yang, Dan
    Zhang, Xiaohong
    Liu, Chen
    Li, Yucong
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 6710 - 6717
  • [50] End-to-end aspect-based sentiment analysis with hierarchical multi-task learning
    Wang, Xinyi
    Xu, Guangluan
    Zhang, Zequn
    Jin, Li
    Sun, Xian
    [J]. NEUROCOMPUTING, 2021, 455 : 178 - 188