Enhancing Automatic Speech Recognition With Personalized Models: Improving Accuracy Through Individualized Fine-Tuning

被引:0
|
作者
Brydinskyi, Vitalii [1 ]
Sabodashko, Dmytro [1 ]
Khoma, Yuriy [1 ]
Podpora, Michal [2 ]
Konovalov, Alexander [3 ]
Khoma, Volodymyr [4 ]
机构
[1] Lviv Polytech Natl Univ, Inst Comp Technol Automat & Metrol, UA-79013 Lvov, Ukraine
[2] Opole Univ Technol, Dept Comp Sci, PL-45758 Opole, Poland
[3] Vidby AG, CH-6343 Risch Rotkreuz, Switzerland
[4] Opole Univ Technol, Dept Control Engn, PL-45758 Opole, Poland
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Automatic speech recognition; Transformers; Natural language processing; speech processing; natural language processing; sound recognition;
D O I
10.1109/ACCESS.2024.3443811
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic speech recognition (ASR) systems have become increasingly popular in recent years due to their ability to convert spoken language into text. Nonetheless, despite their widespread use, existing speaker-independent ASR systems frequently encounter challenges related to variations in speaking styles, accents, and vocal characteristics, leading to potential recognition inaccuracies. This study delves into the feasibility of personalized ASR systems that adapt to the unique voice attributes of individual speakers, thereby enhancing recognition accuracy. It provides an overview of our methodology, focusing on the design, development, and evaluation of both speaker-independent and personalized ASR systems. The dataset used included diverse speakers selected from three extensive datasets: TedLIUM-3, CommonVoice, and GoogleVoice, demonstrating the capability of our methodology to accommodate various accents and challenges of both natural and synthetic voices. In terms of signal classification and interpretation, the personalized model eclipsed the speaker-independent variant, registering an enhancement of up to similar to 3% for natural voices and similar to 10% for synthetic voices in recognition accuracy for individual speakers. Our findings demonstrate that personalized ASR systems can significantly improve the accuracy of speech recognition for individual speakers and highlight the importance of adapting ASR models to individual voices.
引用
收藏
页码:116649 / 116656
页数:8
相关论文
共 50 条
  • [21] Jointly Fine-Tuning "BERT-like" Self Supervised Models to Improve Multimodal Speech Emotion Recognition
    Siriwardhana, Shamane
    Reis, Andrew
    Weerasekera, Rivindu
    Nanayakkara, Suranga
    INTERSPEECH 2020, 2020, : 3755 - 3759
  • [22] A Deep Transfer Learning Approach to Fine-Tuning Facial Recognition Models
    Luttrell, Joseph
    Zhou, Zhaoxian
    Zhang, Yuanyuan.
    Zhang, Chaoyang
    Gong, Ping
    Yang, Bei
    Li, Runzhi
    PROCEEDINGS OF THE 2018 13TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2018), 2018, : 2671 - 2676
  • [23] Introduction To Partial Fine-tuning: A Comprehensive Evaluation Of End-to-end Children's Automatic Speech Recognition Adaptation
    Rolland, Thomas
    Abad, Alberto
    INTERSPEECH 2024, 2024, : 5178 - 5182
  • [24] Improving optimization of convolutional neural networks through parameter fine-tuning
    Becherer, Nicholas
    Pecarina, John
    Nykl, Scott
    Hopkinson, Kenneth
    NEURAL COMPUTING & APPLICATIONS, 2019, 31 (08): : 3469 - 3479
  • [25] Improving optimization of convolutional neural networks through parameter fine-tuning
    Nicholas Becherer
    John Pecarina
    Scott Nykl
    Kenneth Hopkinson
    Neural Computing and Applications, 2019, 31 : 3469 - 3479
  • [26] Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement
    Yang, Hejung
    Kang, Hong-Goo
    INTERSPEECH 2023, 2023, : 814 - 818
  • [27] Improving Speech Recognition through Automatic Selection of Age Group - Specific Acoustic Models
    Haemaelaeinen, Annika
    Meinedo, Hugo
    Tjalve, Michael
    Pellegrini, Thomas
    Trancoso, Isabel
    Dias, Miguel Sales
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, 2014, 8775 : 12 - 23
  • [28] Enhancing Code Language Models for Program Repair by Curricular Fine-tuning Framework
    Hao, Sichong
    Shi, Xianjun
    Liu, Hongwei
    Shu, Yanjun
    2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION, ICSME, 2023, : 136 - 146
  • [29] Language Models Fine-Tuning for Automatic Format Reconstruction of SEC Financial Filings
    Lombardo, Gianfranco
    Trimigno, Giuseppe
    Pellegrino, Mattia
    Cagnoni, Stefano
    IEEE ACCESS, 2024, 12 : 31249 - 31261
  • [30] Enhancing Task Performance in Continual Instruction Fine-tuning Through Format Uniformity
    Tan, Xiaoyu
    Cheng, Leijun
    Qiu, Xihe
    Shi, Shaojie
    Cheng, Yuan
    Chu, Wei
    Xu, Yinghui
    Qi, Yuan
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2384 - 2389