Semantic Data Augmentation for End-to-End Mandarin Speech Recognition

被引:2
|
作者
Sun, Jianwei [1 ]
Tang, Zhiyuan [1 ]
Yin, Hengxin [1 ]
Wang, Wei [1 ]
Zhao, Xi [1 ]
Zhao, Shuaijiang [1 ]
Lei, Xiaoning [1 ]
Zou, Wei [1 ]
Li, Xiangang [1 ]
机构
[1] KE Holdings Inc, Beijing, Peoples R China
来源
关键词
Speech recognition; End-to-end; Data augmentation; Transposition; ALIGNMENT;
D O I
10.21437/Interspeech.2021-1162
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
End-to-end models have gradually become the preferred option for automatic speech recognition (ASR) applications. During the training of end-to-end ASR, data augmentation is a quite effective technique for regularizing the neural networks. This paper proposes a novel data augmentation technique based on semantic transposition of the transcriptions via syntax rules for end-to-end Mandarin ASR. Specifically, we first segment the transcriptions based on part-of-speech tags. Then transposition strategies, such as placing the object in front of the subject or swapping the subject and the object, are applied on the segmented sentences. Finally, the acoustic features corresponding to the transposed transcription are reassembled based on the audio-to-text forced-alignment produced by a pre-trained ASR system. The combination of original data and augmented one is used for training a new ASR system. The experiments are conducted on the Transformer[2] and Conformer[3] based ASR. The results show that the proposed method can give consistent performance gain to the system. Augmentation related issues, such as comparison of different strategies and ratios for data combination are also investigated.
引用
收藏
页码:1269 / 1273
页数:5
相关论文
共 50 条
  • [11] Data Augmentation for End-to-End Optical Music Recognition
    Lopez-Gutierrez, Juan C.
    Valero-Mas, Jose J.
    Castellanos, Francisco J.
    Calvo-Zaragoza, Jorge
    [J]. DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021 WORKSHOPS, PT I, 2021, 12916 : 59 - 73
  • [12] Semantic Mask for Transformer based End-to-End Speech Recognition
    Wang, Chengyi
    Wu, Yu
    Du, Yujiao
    Li, Jinyu
    Liu, Shujie
    Lu, Liang
    Ren, Shuo
    Ye, Guoli
    Zhao, Sheng
    Zhou, Ming
    [J]. INTERSPEECH 2020, 2020, : 971 - 975
  • [13] You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
    Laptev, Aleksandr
    Korostik, Roman
    Svischev, Aleksey
    Andrusenko, Andrei
    Medennikov, Ivan
    Rybin, Sergey
    [J]. 2020 13TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2020), 2020, : 439 - 444
  • [14] Multi-Level Modeling Units for End-to-End Mandarin Speech Recognition
    Yang, Yuting
    Du, Binbin
    Li, Yuke
    [J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 175 - 179
  • [15] Tibetan-Mandarin Bilingual Speech Recognition Based on End-to-End Framework
    Wang, Qingnan
    Guo, Wu
    Chen, Peixin
    Song, Yan
    [J]. 2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1214 - 1217
  • [16] SUBBAND TEMPORAL ENVELOPE FEATURES AND DATA AUGMENTATION FOR END-TO-END RECOGNITION OF DISTANT CONVERSATIONAL SPEECH
    Do, Cong-Thanh
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6251 - 6255
  • [17] AN ANALYSIS OF DECODING FOR ATTENTION-BASED END-TO-END MANDARIN SPEECH RECOGNITION
    Jiang, Dongwei
    Zou, Wei
    Zhao, Shuaijiang
    Yang, Guilin
    Li, Xiangang
    [J]. 2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 384 - 388
  • [18] Extending Recurrent Neural Aligner for Streaming End-to-End Speech Recognition in Mandarin
    Dong, Linhao
    Zhou, Shiyu
    Chen, Wei
    Xu, Bo
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 816 - 820
  • [19] On the End-to-End Solution to Mandarin-English Code-switching Speech Recognition
    Zeng, Zhiping
    Khassanov, Yerbolat
    Van Tung Pham
    Xu, Haihua
    Chng, Eng Siong
    Li, Haizhou
    [J]. INTERSPEECH 2019, 2019, : 2165 - 2169
  • [20] MKD: Mixup-Based Knowledge Distillation for Mandarin End-to-End Speech Recognition
    Wu, Xing
    Jin, Yifan
    Wang, Jianjia
    Qian, Quan
    Guo, Yike
    [J]. ALGORITHMS, 2022, 15 (05)