Semantic Data Augmentation for End-to-End Mandarin Speech Recognition

被引：2

作者：

Sun, Jianwei ^{[1
]}

Tang, Zhiyuan ^{[1
]}

Yin, Hengxin ^{[1
]}

Wang, Wei ^{[1
]}

Zhao, Xi ^{[1
]}

Zhao, Shuaijiang ^{[1
]}

Lei, Xiaoning ^{[1
]}

Zou, Wei ^{[1
]}

Li, Xiangang ^{[1
]}

机构：

[1] KE Holdings Inc, Beijing, Peoples R China

来源：

INTERSPEECH 2021 | 2021年

关键词：

Speech recognition; End-to-end; Data augmentation; Transposition; ALIGNMENT;

D O I：

10.21437/Interspeech.2021-1162

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

End-to-end models have gradually become the preferred option for automatic speech recognition (ASR) applications. During the training of end-to-end ASR, data augmentation is a quite effective technique for regularizing the neural networks. This paper proposes a novel data augmentation technique based on semantic transposition of the transcriptions via syntax rules for end-to-end Mandarin ASR. Specifically, we first segment the transcriptions based on part-of-speech tags. Then transposition strategies, such as placing the object in front of the subject or swapping the subject and the object, are applied on the segmented sentences. Finally, the acoustic features corresponding to the transposed transcription are reassembled based on the audio-to-text forced-alignment produced by a pre-trained ASR system. The combination of original data and augmented one is used for training a new ASR system. The experiments are conducted on the Transformer[2] and Conformer[3] based ASR. The results show that the proposed method can give consistent performance gain to the system. Augmentation related issues, such as comparison of different strategies and ratios for data combination are also investigated.

引用

页码：1269 / 1273

页数：5

共 50 条

[11] Data Augmentation for End-to-End Optical Music Recognition
Lopez-Gutierrez, Juan C.
Valero-Mas, Jose J.
Castellanos, Francisco J.
Calvo-Zaragoza, Jorge
[J]. DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021 WORKSHOPS, PT I, 2021, 12916 : 59 - 73
[12] Semantic Mask for Transformer based End-to-End Speech Recognition
Wang, Chengyi
Wu, Yu
Du, Yujiao
Li, Jinyu
Liu, Shujie
Lu, Liang
Ren, Shuo
Ye, Guoli
Zhao, Sheng
Zhou, Ming
[J]. INTERSPEECH 2020, 2020, : 971 - 975
[13] You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
Laptev, Aleksandr
Korostik, Roman
Svischev, Aleksey
Andrusenko, Andrei
Medennikov, Ivan
Rybin, Sergey
[J]. 2020 13TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2020), 2020, : 439 - 444
[14] Multi-Level Modeling Units for End-to-End Mandarin Speech Recognition
Yang, Yuting
Du, Binbin
Li, Yuke
[J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 175 - 179
[15] Tibetan-Mandarin Bilingual Speech Recognition Based on End-to-End Framework
Wang, Qingnan
Guo, Wu
Chen, Peixin
Song, Yan
[J]. 2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1214 - 1217
[16] SUBBAND TEMPORAL ENVELOPE FEATURES AND DATA AUGMENTATION FOR END-TO-END RECOGNITION OF DISTANT CONVERSATIONAL SPEECH
Do, Cong-Thanh
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6251 - 6255
[17] AN ANALYSIS OF DECODING FOR ATTENTION-BASED END-TO-END MANDARIN SPEECH RECOGNITION
Jiang, Dongwei
Zou, Wei
Zhao, Shuaijiang
Yang, Guilin
Li, Xiangang
[J]. 2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 384 - 388
[18] Extending Recurrent Neural Aligner for Streaming End-to-End Speech Recognition in Mandarin
Dong, Linhao
Zhou, Shiyu
Chen, Wei
Xu, Bo
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 816 - 820
[19] On the End-to-End Solution to Mandarin-English Code-switching Speech Recognition
Zeng, Zhiping
Khassanov, Yerbolat
Van Tung Pham
Xu, Haihua
Chng, Eng Siong
Li, Haizhou
[J]. INTERSPEECH 2019, 2019, : 2165 - 2169
[20] MKD: Mixup-Based Knowledge Distillation for Mandarin End-to-End Speech Recognition
Wu, Xing
Jin, Yifan
Wang, Jianjia
Qian, Quan
Guo, Yike
[J]. ALGORITHMS, 2022, 15 (05)

← 1 2 3 4 5 →