Residual Convolutional Neural Network-Based Dysarthric Speech Recognition

被引:0
|
作者
Kumar, Raj [1 ]
Tripathy, Manoj [1 ]
Anand, R. S. [1 ]
Kumar, Niraj [2 ]
机构
[1] Indian Inst Technol Roorkee, Elect Engn Dept, Roorkee, Uttaranchal, India
[2] All India Inst Med Sci, Neurol, Bibinagar, Telangana, India
关键词
Dysarthric speech; UASpeech database; Residual convolutional neural network; Speech recognition; Deep neural network; OUTPUT COMMUNICATION; SPEAKERS;
D O I
10.1007/s13369-024-08919-5
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
People with dysarthric speech face problems communicating with others and voice-based smart devices. This paper presents the development of a spatial residual convolutional neural network (RCNN)-based dysarthric speech recognition (DSR) system to improve communication for individuals with dysarthric speech. The RCNN model is simplified to an optimal number of layers. The system utilizes a speaker-adaptive approach, incorporating transfer learning to leverage knowledge learned from healthy individuals and a new data augmentation technique to address voice hoarseness in patients. The dysarthric speech is preprocessed using a novel voice cropping technique based on erosion and dilation methods to eliminate unnecessary pauses and hiccups in the time domain. The isolated word recognition accuracy improved by nearly 8.16% for patients with very low intelligibility and 4.74% for patients with low intelligibility speech compared to previously reported results. The proposed DSR system gives the lowest word error rate of 24.09% on the UASpeech dysarthric speech datasets of 15 dysarthric speakers.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Dysarthric Speech Recognition Using Convolutional LSTM Neural Network
    Kim, Myungjong
    Cao, Beiming
    An, Kwanghoon
    Wang, Jun
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2948 - 2952
  • [2] Improving dysarthric speech recognition using empirical mode decomposition and convolutional neural network
    Yakoub, Mohammed
    Selouani, Sid-ahmed
    Zaidi, Brahim-Fares
    Bouchair, Asma
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2020, 2020 (01)
  • [3] Improving dysarthric speech recognition using empirical mode decomposition and convolutional neural network
    Mohammed Sidi Yakoub
    Sid-ahmed Selouani
    Brahim-Fares Zaidi
    Asma Bouchair
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2020
  • [4] Multi-Resolution Spectral Input for Convolutional Neural Network-Based Speech Recognition
    Toth, Laszlo
    [J]. 2017 INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2017,
  • [5] Residual neural network-based fully convolutional network for microstructure segmentation
    Jang, Junmyoung
    Van, Donghyun
    Jang, Hyojin
    Baik, Dae Hyun
    Yoo, Sang Duk
    Park, Jaewoong
    Mhin, Sungwook
    Mazumder, Jyoti
    Lee, Seung Hwan
    [J]. SCIENCE AND TECHNOLOGY OF WELDING AND JOINING, 2020, 25 (04) : 282 - 289
  • [6] Continuous Speech Recognition based on Convolutional Neural Network
    Zhang, Qing-qing
    Liu, Yong
    Pan, Jie-lin
    Yan, Yong-hong
    [J]. SEVENTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2015), 2015, 9631
  • [7] Deep neural network architectures for dysarthric speech analysis and recognition
    Zaidi, Brahim Fares
    Selouani, Sid Ahmed
    Boudraa, Malika
    Sidi Yakoub, Mohammed
    [J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (15): : 9089 - 9108
  • [8] Deep neural network architectures for dysarthric speech analysis and recognition
    Brahim Fares Zaidi
    Sid Ahmed Selouani
    Malika Boudraa
    Mohammed Sidi Yakoub
    [J]. Neural Computing and Applications, 2021, 33 : 9089 - 9108
  • [9] Convolutional Neural Network-Based Action Recognition on Depth Maps
    Trelinski, Jacek
    Kwolek, Bogdan
    [J]. COMPUTER VISION AND GRAPHICS ( ICCVG 2018), 2018, 11114 : 209 - 221
  • [10] Convolutional Neural Network-Based Approach for Citrus Diseases Recognition
    Dong, Caixia
    Xu, Zheling
    Dai, Luanyuan
    Liu, Weinan
    Chen, Quan
    Liu, Yizhang
    Yang, Changcai
    Zou, Tengyue
    [J]. 2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 1495 - 1499