Systems for Low-Resource Speech Recognition Tasks in Open Automatic Speech Recognition and Formosa Speech Recognition Challenges

被引:2
|
作者
Lin, Hung-Pang [1 ]
Zhang, Yu-Jia [1 ]
Chen, Chia-Ping [1 ]
机构
[1] Natl Sun Yat Sen Univ, Kaohsiung, Taiwan
来源
关键词
low-resource speech recognition; Transformer; Conformer; domain adversarial training; TRANSFORMER; CONVOLUTION;
D O I
10.21437/Interspeech.2021-358
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
We, in the team name of NSYSU-MITLab, have participated in low-resource speech recognition of the Open Automatic Speech Recognition Challenge 2020 (OpenASR20) and Formosa Speech Recognition Challenge 2020 (FSR-2020). For the tasks in the challenges, we build and compare end-to-end (E2E) systems and Deep Neural Network Hidden Markov Model (DNN-HMM) systems. In E2E systems, we implement an encoder with Conformer architecture and a decoder with Transformer architecture. In addition, a speaker classifier with a gradient reversal layer is included in the training phase to improve the robustness to speaker variation. In DNN-HMM systems, we implement the Time-Restricted Self-Attention and Factorized Time Delay Neural Networks for the DNN front-end acoustic representation learning. In OpenASR20, the best word error rates we achieved are 61.45% for Cantonese and 74.61% for Vietnamese. In FSR-2020, the best character error rate we achieved is 43.4% for Taiwanese Southern Min Recommended Characters and the best syllable error rate is 25.4% for Taiwan Minnanyu Luomazi Pinyin.
引用
收藏
页码:4339 / 4343
页数:5
相关论文
共 50 条
  • [1] Opportunities and Challenges of Automatic Speech Recognition Systems for Low-Resource Language Speakers
    Reitmaier, Thomas
    Wallington, Electra
    Raju, Dani Kalarikalayil
    Klejch, Ondrej
    Pearson, Jennifer
    Jones, Matt
    Bell, Peter
    Robinson, Simon
    [J]. PROCEEDINGS OF THE 2022 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI' 22), 2022,
  • [2] Enrollment in low-resource speech recognition systems
    Deligne, S
    Dharanipragada, S
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 341 - 344
  • [3] Low-resource automatic speech recognition and error analyses of oral cancer speech
    Halpern, Bence Mark
    Feng, Siyuan
    van Son, Rob
    van den Brekel, Michiel
    Scharenborg, Odette
    [J]. SPEECH COMMUNICATION, 2022, 141 : 14 - 27
  • [4] MIXSPEECH: DATA AUGMENTATION FOR LOW-RESOURCE AUTOMATIC SPEECH RECOGNITION
    Meng, Linghui
    Xu, Jin
    Tan, Xu
    Wang, Jindong
    Qin, Tao
    Xu, Bo
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7008 - 7012
  • [5] OpenASR20: An Open Challenge for Automatic Speech Recognition of Conversational Telephone Speech in Low-Resource Languages
    Peterson, Kay
    Tong, Audrey
    Yu, Yan
    [J]. INTERSPEECH 2021, 2021, : 4324 - 4328
  • [6] CURRICULUM OPTIMIZATION FOR LOW-RESOURCE SPEECH RECOGNITION
    Kuznetsova, Anastasia
    Kumar, Anurag
    Fox, Jennifer Drexler
    Tyers, Francis M.
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8187 - 8191
  • [7] OpenASR21: The Second Open Challenge for Automatic Speech Recognition of Low-Resource Languages
    Peterson, Kay
    Tong, Audrey
    Yu, Yan
    [J]. INTERSPEECH 2022, 2022, : 4895 - 4899
  • [8] Automatic speech recognition systems
    Catariov, A
    [J]. Information Technologies 2004, 2004, 5822 : 83 - 93
  • [9] Frontier Research on Low-Resource Speech Recognition Technology
    Slam, Wushour
    Li, Yanan
    Urouvas, Nurmamet
    [J]. SENSORS, 2023, 23 (22)
  • [10] ADVERSARIAL MULTILINGUAL TRAINING FOR LOW-RESOURCE SPEECH RECOGNITION
    Yi, Jiangyan
    Tao, Jianhua
    Wen, Zhengqi
    Bai, Ye
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4899 - 4903