Semantic Data Augmentation for End-to-End Mandarin Speech Recognition

被引:2
|
作者
Sun, Jianwei [1 ]
Tang, Zhiyuan [1 ]
Yin, Hengxin [1 ]
Wang, Wei [1 ]
Zhao, Xi [1 ]
Zhao, Shuaijiang [1 ]
Lei, Xiaoning [1 ]
Zou, Wei [1 ]
Li, Xiangang [1 ]
机构
[1] KE Holdings Inc, Beijing, Peoples R China
来源
关键词
Speech recognition; End-to-end; Data augmentation; Transposition; ALIGNMENT;
D O I
10.21437/Interspeech.2021-1162
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
End-to-end models have gradually become the preferred option for automatic speech recognition (ASR) applications. During the training of end-to-end ASR, data augmentation is a quite effective technique for regularizing the neural networks. This paper proposes a novel data augmentation technique based on semantic transposition of the transcriptions via syntax rules for end-to-end Mandarin ASR. Specifically, we first segment the transcriptions based on part-of-speech tags. Then transposition strategies, such as placing the object in front of the subject or swapping the subject and the object, are applied on the segmented sentences. Finally, the acoustic features corresponding to the transposed transcription are reassembled based on the audio-to-text forced-alignment produced by a pre-trained ASR system. The combination of original data and augmented one is used for training a new ASR system. The experiments are conducted on the Transformer[2] and Conformer[3] based ASR. The results show that the proposed method can give consistent performance gain to the system. Augmentation related issues, such as comparison of different strategies and ratios for data combination are also investigated.
引用
收藏
页码:1269 / 1273
页数:5
相关论文
共 50 条
  • [1] Data Augmentation for End-to-end Silent Speech Recognition for Laryngectomees
    Cao, Beiming
    Teplansky, Kristin
    Sebkhi, Nordine
    Bhaysar, Arpan
    Inan, Omer T.
    Samlan, Robin
    Mau, Ted
    Wang, Jun
    [J]. INTERSPEECH 2022, 2022, : 3653 - 3657
  • [2] DATA AUGMENTATION FOR END-TO-END CODE-SWITCHING SPEECH RECOGNITION
    Du, Chenpeng
    Li, Hao
    Lu, Yizhou
    Wang, Lan
    Qian, Yanmin
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 194 - 200
  • [3] SpecSwap: A Simple Data Augmentation Method for End-to-End Speech Recognition
    Song, Xingchen
    Wu, Zhiyong
    Huang, Yiheng
    Su, Dan
    Meng, Helen
    [J]. INTERSPEECH 2020, 2020, : 581 - 585
  • [4] Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
    Amodei, Dario
    Ananthanarayanan, Sundaram
    Anubhai, Rishita
    Bai, Jingliang
    Battenberg, Eric
    Case, Carl
    Casper, Jared
    Catanzaro, Bryan
    Cheng, Qiang
    Chen, Guoliang
    Chen, Jie
    Chen, Jingdong
    Chen, Zhijie
    Chrzanowski, Mike
    Coates, Adam
    Diamos, Greg
    Ding, Ke
    Du, Niandong
    Elsen, Erich
    Engel, Jesse
    Fang, Weiwei
    Fan, Linxi
    Fougner, Christopher
    Gao, Liang
    Gong, Caixia
    Hannun, Awni
    Han, Tony
    Johannes, Lappi Vaino
    Jiang, Bing
    Ju, Cai
    Jun, Billy
    LeGresley, Patrick
    Lin, Libby
    Liu, Junjie
    Liu, Yang
    Li, Weigao
    Li, Xiangang
    Ma, Dongpeng
    Narang, Sharan
    Ng, Andrew
    Ozair, Sherjil
    Peng, Yiping
    Prenger, Ryan
    Qian, Sheng
    Quan, Zongfeng
    Raiman, Jonathan
    Rao, Vinay
    Satheesh, Sanjeev
    Seetapun, David
    Sengupta, Shubho
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [5] End-to-End Mandarin Speech Recognition Combining CNN and BLSTM
    Wang, Dong
    Wang, Xiaodong
    Lv, Shaohe
    [J]. SYMMETRY-BASEL, 2019, 11 (05):
  • [6] AUDITORY-BASED DATA AUGMENTATION FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
    Tu, Zehai
    Deadman, Jack
    Ma, Ning
    Barker, Jon
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7447 - 7451
  • [7] CONVOLUTIONAL DROPOUT AND WORDPIECE AUGMENTATION FOR END-TO-END SPEECH RECOGNITION
    Xu, Hainan
    Huang, Yinghui
    Zhu, Yun
    Audhkhasi, Kartik
    Ramabhadran, Bhuvana
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5984 - 5988
  • [8] A COMPARABLE STUDY OF MODELING UNITS FOR END-TO-END MANDARIN SPEECH RECOGNITION
    Zou, Wei
    Jiang, Dongwei
    Zhao, Shuaijiang
    Yang, Guilin
    Li, Xiangang
    [J]. 2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 369 - 373
  • [9] Data Augmentation Methods for End-to-end Speech Recognition on Distant-Talk Scenarios
    Tsunoo, Emiru
    Shibata, Kentaro
    Narisetty, Chaitanya
    Kashiwagi, Yosuke
    Watanabe, Shinji
    [J]. INTERSPEECH 2021, 2021, : 301 - 305
  • [10] STARGAN FOR EMOTIONAL SPEECH CONVERSION: VALIDATED BY DATA AUGMENTATION OF END-TO-END EMOTION RECOGNITION
    Rizos, Georgios
    Baird, Alice
    Elliott, Max
    Schuller, Bjorn
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3502 - 3506