Introduction To Partial Fine-tuning: A Comprehensive Evaluation Of End-to-end Children's Automatic Speech Recognition Adaptation

被引:0
|
作者
Rolland, Thomas [1 ,2 ]
Abad, Alberto [1 ,2 ]
机构
[1] INESC ID, Lisbon, Portugal
[2] Univ Lisbon, Inst Super Tecn, Lisbon, Portugal
来源
关键词
speech recognition; children speech; transfer learning; over-parameterisation;
D O I
10.21437/Interspeech.2024-1102
中图分类号
学科分类号
摘要
Automatic Speech Recognition (ASR) encounters unique challenges when dealing with children's speech, mainly due to the scarcity of available data. Training large ASR models with constrained data presents a significant challenge. To address this, fine-tuning strategy is frequently employed. However, fine-tuning an entire large pre-trained model with limited children's speech data may overfit leading to decreased performance. This study offers a granular evaluation of children's ASR fine-tuning, departing from conventional whole-network tunning. We present a partial fine-tuning approach spotlighting the importance of the Encoder and Feedforward Neural Network modules in Transformer-based models. Remarkably, this method surpasses the efficacy of whole-model fine-tuning, with a relative word error rate improvement of 9% when dealing with limited data. Our findings highlight the critical role of partial fine-tuning in advancing children's ASR model development.
引用
收藏
页码:5178 / 5182
页数:5
相关论文
共 50 条
  • [21] Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss
    Shakeel, Muhammad
    Sudo, Yui
    Peng, Yifan
    Watanabe, Shinji
    INTERSPEECH 2024, 2024, : 3909 - 3913
  • [22] Insertion-Based Modeling for End-to-End Automatic Speech Recognition
    Fujita, Yuya
    Watanabe, Shinji
    Omachi, Motoi
    Chang, Xuankai
    INTERSPEECH 2020, 2020, : 3660 - 3664
  • [23] Controlling the Noise Robustness of End-to-End Automatic Speech Recognition Systems
    Moeller, Matthias
    Twiefel, Johannes
    Weber, Cornelius
    Wermter, Stefan
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [24] Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems
    Belinkov, Yonatan
    Glass, James
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [25] Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition
    Belinkov, Yonatan
    Ali, Ahmed
    Glass, James
    INTERSPEECH 2019, 2019, : 81 - 85
  • [26] Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition
    Parcollet, Titouan
    Zhang, Ying
    Morchid, Mohamed
    Trabelsi, Chiheb
    Linares, Georges
    De Mori, Renato
    Bengio, Yoshua
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 22 - 26
  • [27] A Neural Time Alignment Module for End-to-End Automatic Speech Recognition
    Jiang, Dongcheng
    Zhang, Chao
    Woodland, Philip C.
    INTERSPEECH 2023, 2023, : 1374 - 1378
  • [28] Unidirectional Neural Network Architectures for End-to-End Automatic Speech Recognition
    Moritz, Niko
    Hori, Takaaki
    Le Roux, Jonathan
    INTERSPEECH 2019, 2019, : 76 - 80
  • [29] Towards end-to-end training of automatic speech recognition for nigerian pidgin
    Ajisafe, Daniel
    Adegboro, Oluwabukola
    Oduntan, Esther
    Arulogun, Tayo
    arXiv, 2020,
  • [30] Integrated End-to-End Automatic Speech Recognition for Languages for Agglutinative Languages
    Bekarystankyzy, Akbayan
    Mamyrbayev, Orken
    Anarbekova, Tolganay
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (06)