Domain Adaptation Speech-to-Text for Low-Resource European Portuguese Using Deep Learning

被引:2
|
作者
Medeiros, Eduardo [1 ]
Corado, Leonel [1 ]
Rato, Luis [1 ,2 ]
Quaresma, Paulo [1 ,2 ]
Salgueiro, Pedro [1 ,2 ]
机构
[1] Univ Evora, Escola Ciencias & Tecnol, P-7000671 Evora, Portugal
[2] Univ Evora, Ctr ALGORITMI, Vista Lab, P-7000671 Evora, Portugal
关键词
machine learning; deep learning; deep neural networks; speech-to-text; automatic speech recognition; NVIDIA NeMo; GPUs; data-centric; Portuguese language; RECOGNITION;
D O I
10.3390/fi15050159
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic speech recognition (ASR), commonly known as speech-to-text, is the process of transcribing audio recordings into text, i.e., transforming speech into the respective sequence of words. This paper presents a deep learning ASR system optimization and evaluation for the European Portuguese language. We present a pipeline composed of several stages for data acquisition, analysis, pre-processing, model creation, and evaluation. A transfer learning approach is proposed considering an English language-optimized model as starting point; a target composed of European Portuguese; and the contribution to the transfer process by a source from a different domain consisting of a multiple-variant Portuguese language dataset, essentially composed of Brazilian Portuguese. A domain adaptation was investigated between European Portuguese and mixed (mostly Brazilian) Portuguese. The proposed optimization evaluation used the NVIDIA NeMo framework implementing the QuartzNet15x5 architecture based on 1D time-channel separable convolutions. Following this transfer learning data-centric approach, the model was optimized, achieving a state-of-the-art word error rate (WER) of 0.0503.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] LOW-RESOURCE LANGUAGE IDENTIFICATION FROM SPEECH USING TRANSFER LEARNING
    Feng, Kexin
    Chaspari, Theodora
    [J]. 2019 IEEE 29TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2019,
  • [22] DISTRIBUTION AUGMENTATION FOR LOW-RESOURCE EXPRESSIVE TEXT-TO-SPEECH
    Lajszczak, Mateusz
    Prasad, Animesh
    van Korlaar, Arent
    Bollepalli, Bajibabu
    Bonafonte, Antonio
    Joly, Arnaud
    Nicolis, Marco
    Moinet, Alexis
    Drugman, Thomas
    Wood, Trevor
    Sokolova, Elena
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8307 - 8311
  • [23] Low-resource AMR-to-Text Generation: A Study on Brazilian Portuguese
    Sobrevilla Cabezudo, Marco Antonio
    Salgueiro, Thiago Alexandre
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2022, (68): : 85 - 97
  • [24] LOW-RESOURCE DOMAIN ADAPTATION FOR SPEAKER RECOGNITION USING CYCLE-GANS
    Nidadavolu, Phani Sankar
    Kataria, Saurabh
    Villalba, Jesus
    Dehak, Najim
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 710 - 717
  • [25] A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition
    Du, Yeqian
    Zhang, Jie
    Zhu, Qiu-shi
    Dai, Lirong
    Wu, MingHui
    Fang, Xin
    Yang, ZhouWang
    [J]. INTERSPEECH 2022, 2022, : 2613 - 2617
  • [26] Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation
    Zolzaya Byambadorj
    Ryota Nishimura
    Altangerel Ayush
    Kengo Ohta
    Norihide Kitaoka
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2021
  • [27] Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation
    Byambadorj, Zolzaya
    Nishimura, Ryota
    Ayush, Altangerel
    Ohta, Kengo
    Kitaoka, Norihide
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
  • [28] AdaptSum: Towards Low-Resource Domain Adaptation for Abstractive Summarization
    Yu, Tiezheng
    Liu, Zihan
    Fung, Pascale
    [J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 5892 - 5904
  • [29] Low-Resource Adaptation of Open-Domain Generative Chatbots
    Gerhard-Young, Greyson
    Anantha, Raviteja
    Chappidi, Srinivas
    Hoffmeister, Bjorn
    [J]. PROCEEDINGS OF THE SECOND DIALDOC WORKSHOP ON DOCUMENT-GROUNDED DIALOGUE AND CONVERSATIONAL QUESTION ANSWERING (DIALDOC 2022), 2022, : 23 - 30
  • [30] TDASS: Target Domain Adaptation Speech Synthesis Framework for Multi-speaker Low-Resource TTS
    Zhang, Xulong
    Wang, Jianzong
    Cheng, Ning
    Xiao, Jing
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,