MIXSPEECH: DATA AUGMENTATION FOR LOW-RESOURCE AUTOMATIC SPEECH RECOGNITION

被引:20
|
作者
Meng, Linghui [1 ,2 ]
Xu, Jin [3 ]
Tan, Xu [4 ]
Wang, Jindong [4 ]
Qin, Tao [4 ]
Xu, Bo [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[3] Tsinghua Univ, Inst Interdisciplinary Informat Sci, Beijing, Peoples R China
[4] Microsoft Res Asia, Beijing, Peoples R China
关键词
Speech Recognition; Data Augmentation; Low-resource; Mixup; MODELS;
D O I
10.1109/ICASSP39728.2021.9414483
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose MixSpeech, a simple yet effective data augmentation method based on mixup for automatic speech recognition (ASR). MixSpeech trains an ASR model by taking a weighted combination of two different speech features (e.g., mel-spectrograms or MFCC) as the input, and recognizing both text sequences, where the two recognition losses use the same combination weight. We apply MixSpeech on two popular end-to-end speech recognition models including LAS (Listen, Attend and Spell) and Transformer, and conduct experiments on several low-resource datasets including TIMIT, WSJ, and HKUST. Experimental results show that MixSpeech achieves better accuracy than the baseline models without data augmentation, and outperforms a strong data augmentation method SpecAugment on these recognition tasks. Specifically, MixSpeech outperforms SpecAugment with a relative PER improvement of 10.6% on TIMIT dataset, and achieves a strong WER of 4.7% on WSJ dataset.
引用
收藏
页码:7008 / 7012
页数:5
相关论文
共 50 条
  • [1] Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation
    Bartelds, Martijn
    San, Nay
    McDonnell, Bradley
    Jurafsky, Dan
    Wieling, Martijn
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 715 - 729
  • [2] Generative Adversarial Training Data Adaptation for Very Low-resource Automatic Speech Recognition
    Matsuura, Kohei
    Mimura, Masato
    Sakai, Shinsuke
    Kawahara, Tatsuya
    [J]. INTERSPEECH 2020, 2020, : 2737 - 2741
  • [3] Optimizing Data Usage for Low-Resource Speech Recognition
    Qian, Yanmin
    Zhou, Zhikai
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 394 - 403
  • [4] Low-resource automatic speech recognition and error analyses of oral cancer speech
    Halpern, Bence Mark
    Feng, Siyuan
    van Son, Rob
    van den Brekel, Michiel
    Scharenborg, Odette
    [J]. SPEECH COMMUNICATION, 2022, 141 : 14 - 27
  • [5] Systems for Low-Resource Speech Recognition Tasks in Open Automatic Speech Recognition and Formosa Speech Recognition Challenges
    Lin, Hung-Pang
    Zhang, Yu-Jia
    Chen, Chia-Ping
    [J]. INTERSPEECH 2021, 2021, : 4339 - 4343
  • [6] LOW-RESOURCE EXPRESSIVE TEXT-TO-SPEECH USING DATA AUGMENTATION
    Huybrechts, Goeric
    Merritt, Thomas
    Comini, Giulia
    Perz, Bartek
    Shah, Raahil
    Lorenzo-Trueba, Jaime
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6593 - 6597
  • [7] EXPLORING EFFECTIVE DATA UTILIZATION FOR LOW-RESOURCE SPEECH RECOGNITION
    Zhou, Zhikai
    Wang, Wei
    Zhang, Wangyou
    Qian, Yanmin
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8192 - 8196
  • [8] Opportunities and Challenges of Automatic Speech Recognition Systems for Low-Resource Language Speakers
    Reitmaier, Thomas
    Wallington, Electra
    Raju, Dani Kalarikalayil
    Klejch, Ondrej
    Pearson, Jennifer
    Jones, Matt
    Bell, Peter
    Robinson, Simon
    [J]. PROCEEDINGS OF THE 2022 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI' 22), 2022,
  • [9] SUPERVISED AND UNSUPERVISED ACTIVE LEARNING FOR AUTOMATIC SPEECH RECOGNITION OF LOW-RESOURCE LANGUAGES
    Syed, Ali Raza
    Rosenberg, Andrew
    Kislal, Ellen
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5320 - 5324
  • [10] Investigate Automatic Speech Recognition and Keyword Search for Very Low-Resource Language
    Ni, Chongjia
    Ma, Bin
    [J]. 2017 IEEE 2ND INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP), 2017, : 336 - 340