Category-based and Target-based Data Augmentation for Dysarthric Speech Recognition Using Transfer Learning

被引:0
|
作者
Nawroly, Sarkhell Sirwan [1 ]
Popescu, Decebal [1 ]
Antony, Mariya Celin T. H. E. K. E. K. A. R. A. [2 ]
机构
[1] Natl Univ Sci & Technol POLITEHN Bucharest, Fac Automat Control & Comp Sci, 313 Splaiul Independentei, Bucharest 060042, Romania
[2] Sai Univ, Sch Comp & Data Sci, Paiyanur 603104, Tamil Nadu, India
来源
STUDIES IN INFORMATICS AND CONTROL | 2024年 / 33卷 / 04期
关键词
Dysarthric speech recognition; Noise analysis; Transfer learning approach; NOISE;
D O I
10.24846/v33i4y202408
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Dysarthric speech recognition poses unique challenges in comparison with normal speech recognition systems due to the scarcity of dysarthric speech data. To address this data sparsity issue, researchers have developed data augmentation techniques. These techniques utilize either the original dysarthric speech examples or speech data pertaining to normal speakers to generate new dysarthric speech data, thereby improving the dysarthric speech recognition performance. This study uses dysarthric speech examples to create augmented examples for training purposes in order to retain the identity of the dysarthric speakers in terms of their speech errors. A two-stage transfer learning strategy is employed, in the first stage of which a category-specific low-frequency noise augmentation method is introduced, while in its second stage a dysarthric speaker-specific data augmentation approach is implemented. The proposed method blends the advantages of various data augmentation approaches in the literature to develop a fine two-stage model that can handle data augmentation without compromising on the quality of the target model. This two-stage approach achieved a notable Word Error Rate (WER) reduction of approximately 11.369%, especially among the severely affected dysarthric speakers, by contrast to the transfer learning method that relies only on normal speech-related data for training.
引用
收藏
页数:130
相关论文
共 50 条
  • [41] Speech emotion recognition using data augmentation
    V. M. Praseetha
    P. P. Joby
    International Journal of Speech Technology, 2022, 25 : 783 - 792
  • [42] Speech emotion recognition using data augmentation
    Praseetha, V. M.
    Joby, P. P.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 25 (4) : 783 - 792
  • [43] Using category-based collaborative filtering in the Active WebMuseum
    Kohrs, A
    Merialdo, B
    2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 351 - 354
  • [44] Speech Vision: An End-to-End Deep Learning-Based Dysarthric Automatic Speech Recognition System
    Shahamiri, Seyed Reza
    IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2021, 29 : 852 - 861
  • [45] Deep Reinforcement Learning Framework for Category-Based Item Recommendation
    Fu, Mingsheng
    Agrawal, Anubha
    Irissappane, Athirai A.
    Zhang, Jie
    Huang, Liwei
    Qu, Hong
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (11) : 12028 - 12041
  • [46] Recognition of Oracle Bone Inscriptions Using Deep Learning based on Data Augmentation
    Meng, Lin
    Kamitoku, Nauki
    Yamazaki, Katsuhiro
    2018 IEEE INTERNATIONAL CONFERENCE ON METROLOGY FOR ARCHAEOLOGY AND CULTURAL HERITAGE (METROARCHAEO 2018), 2018, : 33 - 38
  • [47] Transfer learning through perturbation-based in-domain spectrogram augmentation for adult speech recognition
    Kadyan, Virender
    Bawa, Puneet
    Neural Computing and Applications, 2022, 34 (23): : 21015 - 21033
  • [48] Target-Based Temporal-Difference Learning
    Lee, Donghwan
    He, Niao
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [49] Transfer learning through perturbation-based in-domain spectrogram augmentation for adult speech recognition
    Kadyan, Virender
    Bawa, Puneet
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (23): : 21015 - 21033
  • [50] Category-Based 802.11ax Target Wake Time Solution
    Qiu, Wenxun
    Chen, Guanbo
    Nguyen, Khuong N.
    Sehgal, Abhishek
    Nayak, Peshal
    Choi, Junsu
    IEEE ACCESS, 2021, 9 : 100154 - 100172