Systems for Low-Resource Speech Recognition Tasks in Open Automatic Speech Recognition and Formosa Speech Recognition Challenges

被引:2
|
作者
Lin, Hung-Pang [1 ]
Zhang, Yu-Jia [1 ]
Chen, Chia-Ping [1 ]
机构
[1] Natl Sun Yat Sen Univ, Kaohsiung, Taiwan
来源
关键词
low-resource speech recognition; Transformer; Conformer; domain adversarial training; TRANSFORMER; CONVOLUTION;
D O I
10.21437/Interspeech.2021-358
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
We, in the team name of NSYSU-MITLab, have participated in low-resource speech recognition of the Open Automatic Speech Recognition Challenge 2020 (OpenASR20) and Formosa Speech Recognition Challenge 2020 (FSR-2020). For the tasks in the challenges, we build and compare end-to-end (E2E) systems and Deep Neural Network Hidden Markov Model (DNN-HMM) systems. In E2E systems, we implement an encoder with Conformer architecture and a decoder with Transformer architecture. In addition, a speaker classifier with a gradient reversal layer is included in the training phase to improve the robustness to speaker variation. In DNN-HMM systems, we implement the Time-Restricted Self-Attention and Factorized Time Delay Neural Networks for the DNN front-end acoustic representation learning. In OpenASR20, the best word error rates we achieved are 61.45% for Cantonese and 74.61% for Vietnamese. In FSR-2020, the best character error rate we achieved is 43.4% for Taiwanese Southern Min Recommended Characters and the best syllable error rate is 25.4% for Taiwan Minnanyu Luomazi Pinyin.
引用
收藏
页码:4339 / 4343
页数:5
相关论文
共 50 条
  • [41] AUTOMATIC SPEECH RECOGNITION
    RAO, PVS
    PALIWAL, KK
    [J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 1986, 9 : 85 - 120
  • [42] AUTOMATIC RECOGNITION OF SPEECH
    MARILL, T
    [J]. IRE TRANSACTIONS ON HUMAN FACTORS IN ELECTRONICS, 1961, HFE2 (01): : 34 - +
  • [43] Generative Adversarial Training Data Adaptation for Very Low-resource Automatic Speech Recognition
    Matsuura, Kohei
    Mimura, Masato
    Sakai, Shinsuke
    Kawahara, Tatsuya
    [J]. INTERSPEECH 2020, 2020, : 2737 - 2741
  • [44] AUTOMATIC SPEECH RECOGNITION FOR LOW-RESOURCE LANGUAGES: THE THUEE SYSTEMS FOR THE IARPA OPENASR20 EVALUATION
    Zhao, Jing
    Shi, Guixin
    Wang, Guan-Bo
    Zhang, Wei-Qiang
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 335 - 341
  • [45] Investigations on speech recognition systems for low-resource dialectal Arabic-English code-switching speech
    Hamed, Injy
    Denisov, Pavel
    Li, Chia-Yu
    Elmahdy, Mohamed
    Abdennadher, Slim
    Ngoc Thang Vu
    [J]. COMPUTER SPEECH AND LANGUAGE, 2022, 72
  • [46] Validation of Speech Data for Training Automatic Speech Recognition Systems
    Krizaj, Janes
    Gros, Jerneja Zganec
    Dobrisek, Simon
    [J]. 2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 1165 - 1169
  • [47] Arabic Automatic Speech Recognition: Challenges and Progress
    Besdouri, Fatma Zahra
    Zribi, Ines
    Belguith, Lamia Hadrich
    [J]. SPEECH COMMUNICATION, 2024, 163
  • [48] Automatic speech recognition systems: challenges and recent implementation trends
    Sharma, Davinder Pal
    Atkins, Jamin
    [J]. INTERNATIONAL JOURNAL OF SIGNAL AND IMAGING SYSTEMS ENGINEERING, 2014, 7 (04) : 220 - 234
  • [49] A Survey of Automatic Speech Recognition for Dysarthric Speech
    Qian, Zhaopeng
    Xiao, Kejing
    [J]. ELECTRONICS, 2023, 12 (20)
  • [50] Automatic speech recognition and speech variability: A review
    Benzeghiba, M.
    De Mori, R.
    Deroo, O.
    Dupont, S.
    Erbes, T.
    Jouvet, D.
    Fissore, L.
    Laface, P.
    Mertins, A.
    Ris, C.
    Rose, R.
    Tyagi, V.
    Wellekens, C.
    [J]. SPEECH COMMUNICATION, 2007, 49 (10-11) : 763 - 786