Multi-task CTC Training with Auxiliary Feature Reconstruction for End-to-end Speech Recognition

被引：1

作者：

Kurata, Gakuto ^{[1
]}

Audhkhasi, Kartik ^{[1
]}

机构：

[1] IBM Res AI, San Jose, CA 95120 USA

来源：

INTERSPEECH 2019 | 2019年

关键词：

End-to-end automatic speech recognition; Connectionist Temporal Classification; Long short-term memory; Multi-task learning; MODEL;

D O I：

10.21437/Interspeech.2019-1710

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

We present a multi-task Connectionist Temporal Classification (CTC) training for end-to-end (E2E) automatic speech recognition with input feature reconstruction as an auxiliary task. Whereas the main task of E2E CTC training and the auxiliary reconstruction task share the encoder network, the auxiliary task tries to reconstruct the input feature from the encoded information. In addition to standard feature reconstruction, we distort the input feature only in the auxiliary reconstruction task, such as (1) swapping the former and latter parts of an utterance, or (2) using a part of an utterance by stripping the beginning or end parts. These distortions intentionally suppress long-span dependencies in the time domain, which avoids overfitting to the training data. We trained phone-based CTC and word-based CTC models with the proposed multi-task learning and demonstrated that it improves ASR accuracy on various test sets that are matched and unmatched with the training data.

引用

页码：1636 / 1640

页数：5

共 50 条

[1] JOINT CTC-ATTENTION BASED END-TO-END SPEECH RECOGNITION USING MULTI-TASK LEARNING
Kim, Suyoun
Hori, Takaaki
Watanabe, Shinji
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4835 - 4839
[2] Multi-Task End-to-End Model for Telugu Dialect and Speech Recognition
Yadavalli, Aditya
Mirishkar, Ganesh S.
Vuppala, Anil Kumar
[J]. INTERSPEECH 2022, 2022, : 1387 - 1391
[3] Age-Invariant Training for End-to-End Child Speech Recognition using Adversarial Multi-Task Learning
Rumberg, Lars
Ehlert, Hanna
Luedtke, Ulrike
Ostermann, Joern
[J]. INTERSPEECH 2021, 2021, : 3850 - 3854
[4] End-to-end Japanese Multi-dialect Speech Recognition and Dialect Identification with Multi-task Learning
Imaizumi, Ryo
Masumura, Ryo
Shiota, Sayaka
Kiya, Hitoshi
[J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2022, 11 (01)
[5] Hybrid Multi-Task Learning for End-To-End Multimodal Emotion Recognition
Chen, Junjie
Li, Yongwei
Zhao, Ziping
Liu, Xuefei
Wen, Zhengqi
Tao, Jianhua
[J]. 2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1966 - 1971
[6] Multi-task and multi-view training for end-to-end relation extraction
Zhang, Junchi
Zhang, Yue
Ji, Donghong
Liu, Mengchi
[J]. NEUROCOMPUTING, 2019, 364 : 245 - 253
[7] Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning
Hou, Wenxin
Dong, Yue
Zhuang, Bairong
Yang, Longfei
Shi, Jiatong
Shinozaki, Takahiro
[J]. INTERSPEECH 2020, 2020, : 1037 - 1041
[8] Hybrid CTC/Attention Architecture for End-to-End Speech Recognition
Watanabe, Shinji
Hori, Takaaki
Kim, Suyoun
Hershey, John R.
Hayashi, Tomoki
[J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1240 - 1253
[9] End-to-End Multi-Task Learning with Attention
Liu, Shikun
Johns, Edward
Davison, Andrew J.
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1871 - 1880
[10] Joint CTC/attention decoding for end-to-end speech recognition
Hori, Takaaki
Watanabe, Shinji
Hershey, John R.
[J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 518 - 529

← 1 2 3 4 5 →