Enhancing Automatic Speech Recognition With Personalized Models: Improving Accuracy Through Individualized Fine-Tuning

被引:0
|
作者
Brydinskyi, Vitalii [1 ]
Sabodashko, Dmytro [1 ]
Khoma, Yuriy [1 ]
Podpora, Michal [2 ]
Konovalov, Alexander [3 ]
Khoma, Volodymyr [4 ]
机构
[1] Lviv Polytech Natl Univ, Inst Comp Technol Automat & Metrol, UA-79013 Lvov, Ukraine
[2] Opole Univ Technol, Dept Comp Sci, PL-45758 Opole, Poland
[3] Vidby AG, CH-6343 Risch Rotkreuz, Switzerland
[4] Opole Univ Technol, Dept Control Engn, PL-45758 Opole, Poland
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Automatic speech recognition; Transformers; Natural language processing; speech processing; natural language processing; sound recognition;
D O I
10.1109/ACCESS.2024.3443811
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic speech recognition (ASR) systems have become increasingly popular in recent years due to their ability to convert spoken language into text. Nonetheless, despite their widespread use, existing speaker-independent ASR systems frequently encounter challenges related to variations in speaking styles, accents, and vocal characteristics, leading to potential recognition inaccuracies. This study delves into the feasibility of personalized ASR systems that adapt to the unique voice attributes of individual speakers, thereby enhancing recognition accuracy. It provides an overview of our methodology, focusing on the design, development, and evaluation of both speaker-independent and personalized ASR systems. The dataset used included diverse speakers selected from three extensive datasets: TedLIUM-3, CommonVoice, and GoogleVoice, demonstrating the capability of our methodology to accommodate various accents and challenges of both natural and synthetic voices. In terms of signal classification and interpretation, the personalized model eclipsed the speaker-independent variant, registering an enhancement of up to similar to 3% for natural voices and similar to 10% for synthetic voices in recognition accuracy for individual speakers. Our findings demonstrate that personalized ASR systems can significantly improve the accuracy of speech recognition for individual speakers and highlight the importance of adapting ASR models to individual voices.
引用
收藏
页码:116649 / 116656
页数:8
相关论文
共 50 条
  • [1] SPEECH RECOGNITION BY SIMPLY FINE-TUNING BERT
    Huang, Wen-Chin
    Wu, Chia-Hua
    Luo, Shang-Bao
    Chen, Kuan-Yu
    Wang, Hsin-Min
    Toda, Tomoki
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7343 - 7347
  • [2] Improving Speech Emotion Recognition via Fine-tuning ASR with Speaker Information
    Ta, Bao Thang
    Nguyen, Tung Lam
    Dang, Dinh Son
    Le, Nhat Minh
    Do, Van Hai
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1596 - 1601
  • [3] Personalized Large Language Models through Parameter Efficient Fine-Tuning Techniques
    Braga, Marco
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 3076 - 3076
  • [4] Improving fine-tuning in composite Higgs models
    Banerjee, Avik
    Bhattacharyya, Gautam
    Ray, Tirtha Sankar
    PHYSICAL REVIEW D, 2017, 96 (03)
  • [5] Enhancing recognition and interpretation of functional phenotypic sequences through fine-tuning pre-trained genomic models
    Du, Duo
    Zhong, Fan
    Liu, Lei
    JOURNAL OF TRANSLATIONAL MEDICINE, 2024, 22 (01)
  • [6] Personalized Aging-in-Place Support through Fine-Tuning of Generative AI Models
    Griffith, Henry
    Rathore, Heena
    2023 EIGHTH INTERNATIONAL CONFERENCE ON MOBILE AND SECURE SERVICES, MOBISECSERV, 2023,
  • [7] Enhancing generalization in camera trap image recognition: Fine-tuning visual language models
    Yang, Zihe
    Tian, Ye
    Wang, Lifeng
    Zhang, Junguo
    NEUROCOMPUTING, 2025, 634
  • [8] Enhancing Multimodal Emotion Recognition through ASR Error Compensation and LLM Fine-Tuning
    Kyung, Jehyun
    Heo, Serin
    Chang, Joon-Hyuk
    INTERSPEECH 2024, 2024, : 4683 - 4687
  • [9] Self-Supervised Fine-Tuning of Automatic Speech Recognition Systems against Signal Processing Attacks
    Jayawardena, Oshan
    Caldera, Dilmi
    Jayawardena, Sandani
    Sandeepa, Avishka
    Bindschaedler, Vincent
    Charles, Subodha
    PROCEEDINGS OF THE 19TH ACM ASIA CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, ACM ASIACCS 2024, 2024, : 1272 - 1286
  • [10] Cascaded encoders for fine-tuning ASR models on overlapped speech
    Rose, Richard
    Chang, Oscar
    Siohan, Olivier
    INTERSPEECH 2023, 2023, : 3457 - 3461