Enhancing Automatic Speech Recognition With Personalized Models: Improving Accuracy Through Individualized Fine-Tuning

被引:0
|
作者
Brydinskyi, Vitalii [1 ]
Sabodashko, Dmytro [1 ]
Khoma, Yuriy [1 ]
Podpora, Michal [2 ]
Konovalov, Alexander [3 ]
Khoma, Volodymyr [4 ]
机构
[1] Lviv Polytech Natl Univ, Inst Comp Technol Automat & Metrol, UA-79013 Lvov, Ukraine
[2] Opole Univ Technol, Dept Comp Sci, PL-45758 Opole, Poland
[3] Vidby AG, CH-6343 Risch Rotkreuz, Switzerland
[4] Opole Univ Technol, Dept Control Engn, PL-45758 Opole, Poland
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Automatic speech recognition; Transformers; Natural language processing; speech processing; natural language processing; sound recognition;
D O I
10.1109/ACCESS.2024.3443811
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic speech recognition (ASR) systems have become increasingly popular in recent years due to their ability to convert spoken language into text. Nonetheless, despite their widespread use, existing speaker-independent ASR systems frequently encounter challenges related to variations in speaking styles, accents, and vocal characteristics, leading to potential recognition inaccuracies. This study delves into the feasibility of personalized ASR systems that adapt to the unique voice attributes of individual speakers, thereby enhancing recognition accuracy. It provides an overview of our methodology, focusing on the design, development, and evaluation of both speaker-independent and personalized ASR systems. The dataset used included diverse speakers selected from three extensive datasets: TedLIUM-3, CommonVoice, and GoogleVoice, demonstrating the capability of our methodology to accommodate various accents and challenges of both natural and synthetic voices. In terms of signal classification and interpretation, the personalized model eclipsed the speaker-independent variant, registering an enhancement of up to similar to 3% for natural voices and similar to 10% for synthetic voices in recognition accuracy for individual speakers. Our findings demonstrate that personalized ASR systems can significantly improve the accuracy of speech recognition for individual speakers and highlight the importance of adapting ASR models to individual voices.
引用
收藏
页码:116649 / 116656
页数:8
相关论文
共 50 条
  • [41] Arabic Sign Language Recognition through Deep Neural Networks Fine-Tuning
    Saleh, Yaser
    Issa, Ghassan F.
    INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2020, 16 (05) : 71 - 83
  • [42] Automatic Data Augmentation for Domain Adapted Fine-Tuning of Self-Supervised Speech Representations
    Zaiem, Salah
    Parcollet, Titouan
    Essid, Slim
    INTERSPEECH 2023, 2023, : 67 - 71
  • [43] Exploring Few-Shot Fine-Tuning Strategies for Models of Visually Grounded Speech
    Miller, Tyler
    Harwath, David
    INTERSPEECH 2022, 2022, : 1416 - 1420
  • [44] Fine-Tuning Large Language Models to Improve Accuracy and Comprehensibility of Automated Code Review
    Yu, Yongda
    Rong, Guoping
    Shen, Haifeng
    Zhang, He
    Shao, Dong
    Wang, Min
    Wei, Zhao
    Xu, Yong
    Wang, Juhong
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2025, 34 (01)
  • [45] NoRefER: a Referenceless Quality Metric for Automatic Speech Recognition via Semi-Supervised Language Model Fine-Tuning with Contrastive Learning
    Yuksel, Kamer Ali
    Ferreira, Thiago Castro
    Javadi, Golara
    Al-Badrashiny, Mohamed
    Gunduz, Ahmet
    INTERSPEECH 2023, 2023, : 466 - 470
  • [46] "(sic)Te vienes? Sure!" Joint Fine-tuning of Language Detection and Transcription Improves Automatic Recognition of Code-Switching Speech
    Hillah, Leopold
    Dubiel, Mateusz
    Leiva, Luis A.
    PROCEEDINGS OF THE 6TH CONFERENCE ON ACM CONVERSATIONAL USER INTERFACES, CUI 2024, 2024,
  • [47] Enhancing Skin Cancer Diagnosis Through Fine-Tuning of Pretrained Models: A Two-Phase Transfer Learning Approach
    Eliwa, Entesar Hamed I.
    INTERNATIONAL JOURNAL OF BREAST CANCER, 2025, 2025 (01)
  • [48] Combining wav2vec 2.0 Fine-Tuning and ConLearnNet for Speech Emotion Recognition
    Sun, Chenjing
    Zhou, Yi
    Huang, Xin
    Yang, Jichen
    Hou, Xianhua
    ELECTRONICS, 2024, 13 (06)
  • [49] ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets
    Amiriparian, Shahin
    Packan, Filip
    Gerczuk, Maurice
    Schuller, Bjorn W.
    INTERSPEECH 2024, 2024, : 2635 - 2639
  • [50] Automatic Speech Recognition of Disordered Speech: Personalized models outperforming human listeners on short phrases
    Green, Jordan R.
    MacDonald, Robert L.
    Jiang, Pan-Pan
    Cattiau, Julie
    Heywood, Rus
    Cave, Richard
    Seaver, Katie
    Ladewig, Marilyn A.
    Tobin, Jimmy
    Brenner, Michael P.
    Nelson, Philip C.
    Tomanek, Katrin
    INTERSPEECH 2021, 2021, : 4778 - 4782