Multi-task CTC Training with Auxiliary Feature Reconstruction for End-to-end Speech Recognition

被引:1
|
作者
Kurata, Gakuto [1 ]
Audhkhasi, Kartik [1 ]
机构
[1] IBM Res AI, San Jose, CA 95120 USA
来源
关键词
End-to-end automatic speech recognition; Connectionist Temporal Classification; Long short-term memory; Multi-task learning; MODEL;
D O I
10.21437/Interspeech.2019-1710
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
We present a multi-task Connectionist Temporal Classification (CTC) training for end-to-end (E2E) automatic speech recognition with input feature reconstruction as an auxiliary task. Whereas the main task of E2E CTC training and the auxiliary reconstruction task share the encoder network, the auxiliary task tries to reconstruct the input feature from the encoded information. In addition to standard feature reconstruction, we distort the input feature only in the auxiliary reconstruction task, such as (1) swapping the former and latter parts of an utterance, or (2) using a part of an utterance by stripping the beginning or end parts. These distortions intentionally suppress long-span dependencies in the time domain, which avoids overfitting to the training data. We trained phone-based CTC and word-based CTC models with the proposed multi-task learning and demonstrated that it improves ASR accuracy on various test sets that are matched and unmatched with the training data.
引用
收藏
页码:1636 / 1640
页数:5
相关论文
共 50 条
  • [1] JOINT CTC-ATTENTION BASED END-TO-END SPEECH RECOGNITION USING MULTI-TASK LEARNING
    Kim, Suyoun
    Hori, Takaaki
    Watanabe, Shinji
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4835 - 4839
  • [2] Multi-Task End-to-End Model for Telugu Dialect and Speech Recognition
    Yadavalli, Aditya
    Mirishkar, Ganesh S.
    Vuppala, Anil Kumar
    [J]. INTERSPEECH 2022, 2022, : 1387 - 1391
  • [3] Age-Invariant Training for End-to-End Child Speech Recognition using Adversarial Multi-Task Learning
    Rumberg, Lars
    Ehlert, Hanna
    Luedtke, Ulrike
    Ostermann, Joern
    [J]. INTERSPEECH 2021, 2021, : 3850 - 3854
  • [4] End-to-end Japanese Multi-dialect Speech Recognition and Dialect Identification with Multi-task Learning
    Imaizumi, Ryo
    Masumura, Ryo
    Shiota, Sayaka
    Kiya, Hitoshi
    [J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2022, 11 (01)
  • [5] Hybrid Multi-Task Learning for End-To-End Multimodal Emotion Recognition
    Chen, Junjie
    Li, Yongwei
    Zhao, Ziping
    Liu, Xuefei
    Wen, Zhengqi
    Tao, Jianhua
    [J]. 2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1966 - 1971
  • [6] Multi-task and multi-view training for end-to-end relation extraction
    Zhang, Junchi
    Zhang, Yue
    Ji, Donghong
    Liu, Mengchi
    [J]. NEUROCOMPUTING, 2019, 364 : 245 - 253
  • [7] Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning
    Hou, Wenxin
    Dong, Yue
    Zhuang, Bairong
    Yang, Longfei
    Shi, Jiatong
    Shinozaki, Takahiro
    [J]. INTERSPEECH 2020, 2020, : 1037 - 1041
  • [8] Hybrid CTC/Attention Architecture for End-to-End Speech Recognition
    Watanabe, Shinji
    Hori, Takaaki
    Kim, Suyoun
    Hershey, John R.
    Hayashi, Tomoki
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1240 - 1253
  • [9] End-to-End Multi-Task Learning with Attention
    Liu, Shikun
    Johns, Edward
    Davison, Andrew J.
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1871 - 1880
  • [10] Joint CTC/attention decoding for end-to-end speech recognition
    Hori, Takaaki
    Watanabe, Shinji
    Hershey, John R.
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 518 - 529