Category-based and Target-based Data Augmentation for Dysarthric Speech Recognition Using Transfer Learning

被引:0
|
作者
Nawroly, Sarkhell Sirwan [1 ]
Popescu, Decebal [1 ]
Antony, Mariya Celin T. H. E. K. E. K. A. R. A. [2 ]
机构
[1] Natl Univ Sci & Technol POLITEHN Bucharest, Fac Automat Control & Comp Sci, 313 Splaiul Independentei, Bucharest 060042, Romania
[2] Sai Univ, Sch Comp & Data Sci, Paiyanur 603104, Tamil Nadu, India
来源
STUDIES IN INFORMATICS AND CONTROL | 2024年 / 33卷 / 04期
关键词
Dysarthric speech recognition; Noise analysis; Transfer learning approach; NOISE;
D O I
10.24846/v33i4y202408
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Dysarthric speech recognition poses unique challenges in comparison with normal speech recognition systems due to the scarcity of dysarthric speech data. To address this data sparsity issue, researchers have developed data augmentation techniques. These techniques utilize either the original dysarthric speech examples or speech data pertaining to normal speakers to generate new dysarthric speech data, thereby improving the dysarthric speech recognition performance. This study uses dysarthric speech examples to create augmented examples for training purposes in order to retain the identity of the dysarthric speakers in terms of their speech errors. A two-stage transfer learning strategy is employed, in the first stage of which a category-specific low-frequency noise augmentation method is introduced, while in its second stage a dysarthric speaker-specific data augmentation approach is implemented. The proposed method blends the advantages of various data augmentation approaches in the literature to develop a fine two-stage model that can handle data augmentation without compromising on the quality of the target model. This two-stage approach achieved a notable Word Error Rate (WER) reduction of approximately 11.369%, especially among the severely affected dysarthric speakers, by contrast to the transfer learning method that relies only on normal speech-related data for training.
引用
收藏
页数:130
相关论文
共 50 条
  • [1] Data Augmentation Techniques for Transfer Learning-Based Continuous Dysarthric Speech Recognition
    T. A. Mariya Celin
    P. Vijayalakshmi
    T. Nagarajan
    Circuits, Systems, and Signal Processing, 2023, 42 : 601 - 622
  • [2] Data Augmentation Techniques for Transfer Learning-Based Continuous Dysarthric Speech Recognition
    Celin, T. A. Mariya
    Vijayalakshmi, P.
    Nagarajan, T.
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 42 (01) : 601 - 622
  • [3] THE ATTRIBUTIONAL CONSEQUENCES OF CATEGORY-BASED AND TARGET-BASED EXPECTANCIES
    CROXTON, JS
    DIGIROLAMO, K
    STONE, M
    JOURNAL OF SOCIAL BEHAVIOR AND PERSONALITY, 1989, 4 (01): : 51 - 60
  • [4] Data Augmentation using Healthy Speech for Dysarthric Speech Recognition
    Vachhani, Bhavik
    Bhat, Chitralekha
    Kopparapu, Sunil Kumar
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 471 - 475
  • [5] SNR-Selection-Based-Data Augmentation for Dysarthric Speech Recognition
    Nawroly, Sarkhell Sirwan
    Popescu, Decebal Gheorghe
    Antony, Mariya Celin Thekekara
    Philominal, Actlin Jeeva Muthu
    STUDIES IN INFORMATICS AND CONTROL, 2023, 32 (04): : 129 - 140
  • [6] EXPECTANCY DISCONFIRMATION AND DISPOSITIONAL INFERENCE - LATENT STRENGTH OF TARGET-BASED AND CATEGORY-BASED EXPECTANCIES
    WEISZ, C
    JONES, EE
    PERSONALITY AND SOCIAL PSYCHOLOGY BULLETIN, 1993, 19 (05) : 563 - 573
  • [7] Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based Augmentation
    Baali, Massa
    Almakky, Ibrahim
    Shehata, Shady
    Karray, Fakhri
    INTERSPEECH 2023, 2023, : 1558 - 1562
  • [8] Analysis for Using Noise as a Source of Data Augmentation for Dysarthric Speech Recognition
    Nawroly, Sarkhell Sirwan
    Popescu, Decebal
    Celin, T. A. Mariya
    Jeeva, M. P. Actlin
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2025,
  • [9] Transfer Learning Using Whisper for Dysarthric Automatic Speech Recognition
    Rathod, Siddharth
    Charola, Monil
    Patil, Hemant A.
    SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 : 579 - 589
  • [10] Dysarthric Speech Recognition Based on Deep Metric Learning
    Takashima, Yuki
    Takashima, Ryoichi
    Takiguchi, Tetsuya
    Ariki, Yasuo
    INTERSPEECH 2020, 2020, : 4796 - 4800