Investigation of Transformer based Spelling Correction Model for CTC-based End-to-End Mandarin Speech Recognition

被引：14

作者：

Zhang, Shiliang ^{[1
]}

Lei, Ming ^{[1
]}

Yan, Zhijie ^{[1
]}

机构：

[1] Alibaba Grp, Machine Intelligence Technol, Beijing, Peoples R China

来源：

INTERSPEECH 2019 | 2019年

关键词：

speech recognition; spelling correction; CTC; End-to-End; Transformer; ERRORS;

D O I：

10.21437/Interspeech.2019-1290

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Connectionist Temporal Classification (CTC) based end-to-end speech recognition system usually need to incorporate an external language model by using WFST-based decoding in order to achieve promising results. This is more essential to Mandarin speech recognition since it owns a special phenomenon, namely homophone, which causes a lot of substitution errors. The linguistic information introduced by language model is somehow helpful to distinguish these substitution errors. In this work, we propose a transformer based spelling correction model to automatically correct errors, especially the substitution errors, made by CTC-based Mandarin speech recognition system. Specifically, we investigate to use the recognition results generated by CTC-based systems as input and the ground-truth transcriptions as output to train a transformer with encoder-decoder architecture, which is much similar to machine translation. Experimental results in a 20,000 hours Mandarin speech recognition task show that the proposed spelling correction model can achieve a CER of 3.41%, which results in 22.9% and 53.2% relative improvement compared to the baseline CTC-based systems decoded with and without language model, respectively.

引用

页码：2180 / 2184

页数：5

共 50 条

[1] END-TO-END AUTOMATIC SPEECH RECOGNITION INTEGRATED WITH CTC-BASED VOICE ACTIVITY DETECTION
Yoshimura, Takenori
Hayashi, Tomoki
Takeda, Kazuya
Watanabe, Shinji
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6999 - 7003
[2] A SPELLING CORRECTION MODEL FOR END-TO-END SPEECH RECOGNITION
Guo, Jinxi
Sainath, Tara N.
Weiss, Ron J.
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5651 - 5655
[3] TRANSFORMER-BASED ONLINE CTC/ATTENTION END-TO-END SPEECH RECOGNITION ARCHITECTURE
Miao, Haoran
Cheng, Gaofeng
Gao, Changfeng
Zhang, Pengyuan
Yan, Yonghong
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6084 - 6088
[4] Multiple-hypothesis CTC-based semi-supervised adaptation of end-to-end speech recognition
Do, Cong-Thanh
Doddipatla, Rama
Hain, Thomas
[J]. arXiv, 2021,
[5] MULTIPLE-HYPOTHESIS CTC-BASED SEMI-SUPERVISED ADAPTATION OF END-TO-END SPEECH RECOGNITION
Do, Cong-Thanh
Doddipatla, Rama
Hain, Thomas
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6978 - 6982
[6] AN END-TO-END SPEECH ACCENT RECOGNITION METHOD BASED ON HYBRID CTC/ATTENTION TRANSFORMER ASR
Gao, Qiang
Wu, Haiwei
Sun, Yanqing
Duan, Yitao
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7253 - 7257
[7] An Investigation of Positional Encoding in Transformer-based End-to-end Speech Recognition
Yue, Fengpeng
Ko, Tom
[J]. 2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
[8] Investigating the Reordering Capability in CTC-based Non-Autoregressive End-to-End Speech Translation
Chuang, Shun-Po
Chuang, Yung-Sung
Chang, Chih-Chiang
Lee, Hung-yi
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 1068 - 1077
[9] A CTC Alignment-Based Non-Autoregressive Transformer for End-to-End Automatic Speech Recognition
Fan, Ruchao
Chu, Wei
Chang, Peng
Alwan, Abeer
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1436 - 1448
[10] Semantic Mask for Transformer based End-to-End Speech Recognition
Wang, Chengyi
Wu, Yu
Du, Yujiao
Li, Jinyu
Liu, Shujie
Lu, Liang
Ren, Shuo
Ye, Guoli
Zhao, Sheng
Zhou, Ming
[J]. INTERSPEECH 2020, 2020, : 971 - 975

← 1 2 3 4 5 →