Domain Adaptation Speech-to-Text for Low-Resource European Portuguese Using Deep Learning

被引:2
|
作者
Medeiros, Eduardo [1 ]
Corado, Leonel [1 ]
Rato, Luis [1 ,2 ]
Quaresma, Paulo [1 ,2 ]
Salgueiro, Pedro [1 ,2 ]
机构
[1] Univ Evora, Escola Ciencias & Tecnol, P-7000671 Evora, Portugal
[2] Univ Evora, Ctr ALGORITMI, Vista Lab, P-7000671 Evora, Portugal
关键词
machine learning; deep learning; deep neural networks; speech-to-text; automatic speech recognition; NVIDIA NeMo; GPUs; data-centric; Portuguese language; RECOGNITION;
D O I
10.3390/fi15050159
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic speech recognition (ASR), commonly known as speech-to-text, is the process of transcribing audio recordings into text, i.e., transforming speech into the respective sequence of words. This paper presents a deep learning ASR system optimization and evaluation for the European Portuguese language. We present a pipeline composed of several stages for data acquisition, analysis, pre-processing, model creation, and evaluation. A transfer learning approach is proposed considering an English language-optimized model as starting point; a target composed of European Portuguese; and the contribution to the transfer process by a source from a different domain consisting of a multiple-variant Portuguese language dataset, essentially composed of Brazilian Portuguese. A domain adaptation was investigated between European Portuguese and mixed (mostly Brazilian) Portuguese. The proposed optimization evaluation used the NVIDIA NeMo framework implementing the QuartzNet15x5 architecture based on 1D time-channel separable convolutions. Following this transfer learning data-centric approach, the model was optimized, achieving a state-of-the-art word error rate (WER) of 0.0503.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual Transfer Learning
    Chen, Yuan-Jui
    Tu, Tao
    Yeh, Cheng-chieh
    Lee, Hung-yi
    [J]. INTERSPEECH 2019, 2019, : 2075 - 2079
  • [42] Low-resource entity resolution with domain generalization and active learning
    Xu, Zhihong
    Wang, Ning
    [J]. NEUROCOMPUTING, 2024, 599
  • [43] Low-resource Deep Entity Resolution with Transfer and Active Learning
    Kasai, Jungo
    Qian, Kun
    Gurajada, Sairam
    Li, Yunyao
    Popa, Lucian
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 5851 - 5861
  • [44] Domain-Aligned Data Augmentation for Low-Resource and Imbalanced Text Classification
    Stylianou, Nikolaos
    Chatzakou, Despoina
    Tsikrika, Theodora
    Vrochidis, Stefanos
    Kompatsiaris, Ioannis
    [J]. ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT II, 2023, 13981 : 172 - 187
  • [45] Low-Resource Emotional Speech Synthesis: Transfer Learning and Data Requirements
    Nesterenko, Anton
    Akhmerov, Ruslan
    Matveeva, Yulia
    Goremykina, Anna
    Astankov, Dmitry
    Shuranov, Evgeniy
    Shirshova, Alexandra
    [J]. SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 508 - 521
  • [46] USING SPEECH ENHANCEMENT TO REALIZE SPEECH SYNTHESIS OF LOW-RESOURCE DUNGAN LANGUAGES
    Jiang, Rui
    Chen, Chengsi
    Shan, Xin
    Yang, Hongwu
    [J]. 2021 24TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2021, : 193 - 198
  • [47] Multilingual Meta-Transfer Learning for Low-Resource Speech Recognition
    Zhou, Rui
    Koshikawa, Takaki
    Ito, Akinori
    Nose, Takashi
    Chen, Chia-Ping
    [J]. IEEE Access, 2024, 12 : 158493 - 158504
  • [48] A Method Improves Speech Recognition with Contrastive Learning in Low-Resource Languages
    Sun, Lixu
    Yolwas, Nurmemet
    Jiang, Lina
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (08):
  • [49] Language-Adversarial Transfer Learning for Low-Resource Speech Recognition
    Yi, Jiangyan
    Tao, Jianhua
    Wen, Zhengqi
    Bai, Ye
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (03) : 621 - 630
  • [50] META LEARNING FOR END-TO-END LOW-RESOURCE SPEECH RECOGNITION
    Hsu, Jui-Yang
    Chen, Yuan-Jui
    Lee, Hung-yi
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7844 - 7848