Introduction To Partial Fine-tuning: A Comprehensive Evaluation Of End-to-end Children's Automatic Speech Recognition Adaptation

被引:0
|
作者
Rolland, Thomas [1 ,2 ]
Abad, Alberto [1 ,2 ]
机构
[1] INESC ID, Lisbon, Portugal
[2] Univ Lisbon, Inst Super Tecn, Lisbon, Portugal
来源
关键词
speech recognition; children speech; transfer learning; over-parameterisation;
D O I
10.21437/Interspeech.2024-1102
中图分类号
学科分类号
摘要
Automatic Speech Recognition (ASR) encounters unique challenges when dealing with children's speech, mainly due to the scarcity of available data. Training large ASR models with constrained data presents a significant challenge. To address this, fine-tuning strategy is frequently employed. However, fine-tuning an entire large pre-trained model with limited children's speech data may overfit leading to decreased performance. This study offers a granular evaluation of children's ASR fine-tuning, departing from conventional whole-network tunning. We present a partial fine-tuning approach spotlighting the importance of the Encoder and Feedforward Neural Network modules in Transformer-based models. Remarkably, this method surpasses the efficacy of whole-model fine-tuning, with a relative word error rate improvement of 9% when dealing with limited data. Our findings highlight the critical role of partial fine-tuning in advancing children's ASR model development.
引用
收藏
页码:5178 / 5182
页数:5
相关论文
共 50 条
  • [31] Dealing with Unknowns in Continual Learning for End-to-end Automatic Speech Recognition
    Sustek, Martin
    Sadhu, Samik
    Hermansky, Hynek
    INTERSPEECH 2022, 2022, : 1046 - 1050
  • [32] A Transformer-Based End-to-End Automatic Speech Recognition Algorithm
    Dong, Fang
    Qian, Yiyang
    Wang, Tianlei
    Liu, Peng
    Cao, Jiuwen
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1592 - 1596
  • [33] Fine-Tuning Self-Supervised Learning Models for End-to-End Pronunciation Scoring
    Zahran, Ahmed I.
    Fahmy, Aly A.
    Wassif, Khaled T.
    Bayomi, Hanaa
    IEEE ACCESS, 2023, 11 : 112650 - 112663
  • [34] The self-adaptation of acoustic encoder in end-to-end automatic speech recognition under diverse acoustic scenes
    Liu Y.
    Zheng L.
    Li T.
    Zhang P.
    Shengxue Xuebao/Acta Acustica, 2023, 48 (06): : 1260 - 1268
  • [35] DOMAIN ADAPTATION OF END-TO-END SPEECH RECOGNITION IN LOW-RESOURCE SETTINGS
    Samarakoon, Lahiru
    Mak, Brian
    Lam, Albert Y. S.
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 382 - 388
  • [36] AI-based language tutoring systems with end-to-end automatic speech recognition and proficiency evaluation
    Kang, Byung Ok
    Jeon, Hyung-Bae
    Lee, Yun Kyung
    ETRI JOURNAL, 2024, 46 (01) : 48 - 58
  • [37] SFA: Searching faster architectures for end-to-end automatic speech recognition models
    Liu, Yukun
    Li, Ta
    Zhang, Pengyuan
    Yan, Yonghong
    COMPUTER SPEECH AND LANGUAGE, 2023, 81
  • [38] AN END-TO-END APPROACH TO JOINT SOCIAL SIGNAL DETECTION AND AUTOMATIC SPEECH RECOGNITION
    Inaguma, Hirofumi
    Mimura, Masato
    Inoue, Koji
    Yoshii, Kazuyoshi
    Kawahara, Tatsuya
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6214 - 6218
  • [39] Improved Training for End-to-End Streaming Automatic Speech Recognition Model with Punctuation
    Kim, Hanbyul
    Seo, Seunghyun
    Lee, Lukas
    Baek, Seolki
    INTERSPEECH 2023, 2023, : 1653 - 1657
  • [40] Hardware Accelerator for Transformer based End-to-End Automatic Speech Recognition System
    Yamini, Shaarada D.
    Mirishkar, Ganesh S.
    Vuppala, Anil Kumar
    Purini, Suresh
    2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW, 2023, : 93 - 100