Category-based and Target-based Data Augmentation for Dysarthric Speech Recognition Using Transfer Learning

被引:0
|
作者
Nawroly, Sarkhell Sirwan [1 ]
Popescu, Decebal [1 ]
Antony, Mariya Celin T. H. E. K. E. K. A. R. A. [2 ]
机构
[1] Natl Univ Sci & Technol POLITEHN Bucharest, Fac Automat Control & Comp Sci, 313 Splaiul Independentei, Bucharest 060042, Romania
[2] Sai Univ, Sch Comp & Data Sci, Paiyanur 603104, Tamil Nadu, India
来源
STUDIES IN INFORMATICS AND CONTROL | 2024年 / 33卷 / 04期
关键词
Dysarthric speech recognition; Noise analysis; Transfer learning approach; NOISE;
D O I
10.24846/v33i4y202408
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Dysarthric speech recognition poses unique challenges in comparison with normal speech recognition systems due to the scarcity of dysarthric speech data. To address this data sparsity issue, researchers have developed data augmentation techniques. These techniques utilize either the original dysarthric speech examples or speech data pertaining to normal speakers to generate new dysarthric speech data, thereby improving the dysarthric speech recognition performance. This study uses dysarthric speech examples to create augmented examples for training purposes in order to retain the identity of the dysarthric speakers in terms of their speech errors. A two-stage transfer learning strategy is employed, in the first stage of which a category-specific low-frequency noise augmentation method is introduced, while in its second stage a dysarthric speaker-specific data augmentation approach is implemented. The proposed method blends the advantages of various data augmentation approaches in the literature to develop a fine two-stage model that can handle data augmentation without compromising on the quality of the target model. This two-stage approach achieved a notable Word Error Rate (WER) reduction of approximately 11.369%, especially among the severely affected dysarthric speakers, by contrast to the transfer learning method that relies only on normal speech-related data for training.
引用
收藏
页数:130
相关论文
共 50 条
  • [31] Using category-based adherence to cluster market-basket data
    Yun, CH
    Chuang, KT
    Chen, MS
    2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, : 546 - 553
  • [32] A transfer learning-based GAN for data augmentation in automatic modulation recognition
    Gao, Hai
    Ke, Jing
    Lu, Xiaochun
    Cheng, Fang
    Chen, Xiaofei
    ENGINEERING RESEARCH EXPRESS, 2024, 6 (04):
  • [33] Improving CNN-based activity recognition by data augmentation and transfer learning
    Kalouris, Gerasimos
    Zacharaki, Evangelia I.
    Megalooikonomou, Vasileios
    2019 IEEE 17TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2019, : 1387 - 1394
  • [34] Content based image retrieval using category-based indexing
    Wardhani, A
    Thomson, T
    2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 783 - 786
  • [35] Deep Autoencoder based Speech Features for Improved Dysarthric Speech Recognition
    Vachhani, Bhavik
    Bhat, Chitralekha
    Das, Biswajit
    Kopparapu, Sunil Kumar
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1854 - 1858
  • [36] A Speech Command Control-Based Recognition System for Dysarthric Patients Based on Deep Learning Technology
    Lin, Yu-Yi
    Zheng, Wei-Zhong
    Chu, Wei Chung
    Han, Ji-Yan
    Hung, Ying-Hsiu
    Ho, Guan-Min
    Chang, Chia-Yuan
    Lai, Ying-Hui
    APPLIED SCIENCES-BASEL, 2021, 11 (06):
  • [37] Improving Accented Speech Recognition using Data Augmentation based on Unsupervised Text-to-Speech Synthesis
    Cong-Thanh Do
    Imai, Shuhei
    Doddipatla, Rama
    Hain, Thomas
    32ND EUROPEAN SIGNAL PROCESSING CONFERENCE, EUSIPCO 2024, 2024, : 136 - 140
  • [38] Enhanced Speech Emotion Recognition Using Conditional-DCGAN-Based Data Augmentation
    Roh, Kyung-Min
    Lee, Seok-Pil
    APPLIED SCIENCES-BASEL, 2024, 14 (21):
  • [39] Face Recognition Based on Deep Learning and Data Augmentation
    Nguyen, Lam Duc Vu
    Chau, Van Van
    Nguyen, Sinh Van
    FUTURE DATA AND SECURITY ENGINEERING. BIG DATA, SECURITY AND PRIVACY, SMART CITY AND INDUSTRY 4.0 APPLICATIONS, FDSE 2022, 2022, 1688 : 560 - 573
  • [40] Speech Emotion Recognition Using Data Augmentation
    Kapoor, Tanisha
    Ganguly, Arnaja
    Rajeswari, D.
    2024 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND APPLIED INFORMATICS, ACCAI 2024, 2024,