Improved Deliberation Network with Text Pre-training for Code-Switching Automatic Speech Recognition

被引：0

作者：

Shen, Zhijie ^{[1
]}

Guo, Wu ^{[1
]}

机构：

[1] Univ Sci & Technol China, Dept Elect Engn & Informat Sci EEIS, Hefei, Peoples R China

来源：

INTERSPEECH 2022 | 2022年

基金：

中国国家自然科学基金;

关键词：

automatic speech recognition; code-switching; deliberation network; text pre-training;

D O I：

10.21437/Interspeech.2022-221

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper proposes an improved deliberation network (DN) for end-to-end code-switching (CS) automatic speech recognition (ASR). In a conventional DN, acoustic encoding and first-pass hypothesis encoding are utilized separately and are simply combined by summation, which cannot take full advantage of their potential complementarity. Hence, the proposed improved DN model exploits the relationship between the two encodings through a two-staged process. First, by integrating the two encodings into a unified semantic space through a shared encoder, and second, by capturing the relevant information from the acoustic encoding through an attention mechanism before the final decoding process. Moreover, the lack of paired training data restricts the generalization ability of the model in CS ASR. To address this problem, the developed DN is pre-trained based on a denoising sequence-to-sequence (seq2seq) objective using unpaired text data. Experiments on a Chinese-English CS dataset demonstrate the effectiveness of the proposed method. Compared with the conventional DN, a 13.5% relative error rate reduction is observed.

引用

页码：3854 / 3858

页数：5

共 50 条

[21] Investigating Multi-task Learning for Automatic Speech Recognition with Code-switching between Mandarin and English
Song, Xiao
Zou, Yuexian
Huang, Shilei
Chen, Shaobin
Liu, Yi
[J]. 2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 27 - 30
[22] Integrating Knowledge in End-to-End Automatic Speech Recognition for Mandarin-English Code-Switching
Li, Chia-Yu
Ngoc Thang Vu
[J]. PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 160 - 165
[23] Swahili Speech Dataset Development and Improved Pre-training Method for Spoken Digit Recognition
Kivaisi, Alexander R.
Zhao, Qingjie
Mbelwa, Jimmy T.
[J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (07)
[24] Improving code-switching speech recognition with data augmentation and system combination
Ma, Duo
Xu, Haihua
Li, Guanyu
Chng, Eng Siong
[J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1308 - 1312
[25] DATA AUGMENTATION FOR END-TO-END CODE-SWITCHING SPEECH RECOGNITION
Du, Chenpeng
Li, Hao
Lu, Yizhou
Wang, Lan
Qian, Yanmin
[J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 194 - 200
[26] Language-specific Characteristic Assistance for Code-switching Speech Recognition
Song, Tongtong
Xu, Qiang
Ge, Meng
Wang, Longbiao
Shi, Hao
Lv, Yongjie
Lin, Yuqin
Dang, Jianwu
[J]. INTERSPEECH 2022, 2022, : 3924 - 3928
[27] Pronunciation augmentation for Mandarin-English code-switching speech recognition
Long, Yanhua
Wei, Shuang
Lian, Jie
Li, Yijie
[J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
[28] Multi-Encoder-Decoder Transformer for Code-Switching Speech Recognition
Zhou, Xinyuan
Yilmaz, Emre
Long, Yanhua
Li, Yijie
Li, Haizhou
[J]. INTERSPEECH 2020, 2020, : 1042 - 1046
[29] Pronunciation augmentation for Mandarin-English code-switching speech recognition
Yanhua Long
Shuang Wei
Jie Lian
Yijie Li
[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2021
[30] Semi-supervised acoustic model training for speech with code-switching
Yilmaz, Emre
McLaren, Mitchell
van den Heuvel, Henk
van Leeuwen, David A.
[J]. SPEECH COMMUNICATION, 2018, 105 : 12 - 22

← 1 2 3 4 5 →