Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech

被引:0
|
作者
Christensen, H. [1 ]
Aniol, M. B. [2 ]
Bell, P. [2 ]
Green, P. [1 ]
Hain, T. [1 ]
King, S. [2 ]
Swietojanski, P. [2 ]
机构
[1] Univ Sheffield, Dept Comp Sci, Sheffield S1 4DP, S Yorkshire, England
[2] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9AB, Midlothian, Scotland
关键词
Speech recognition; Tandem features; Deep belief neural network; Disordered speech;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently there has been increasing interest in ways of using out-of-domain (OOD) data to improve automatic speech recognition performance in domains where only limited data is available. This paper focuses on one such domain, namely that of disordered speech for which only very small databases exist, but where normal speech can be considered ODD. Standard approaches for handling small data domains use adaptation from OOD models into the target domain, but here we investigate an alternative approach with its focus on the feature extraction stage: OOD data is used to train feature-generating deep belief neural networks. Using AMI meeting and TED talk datasets, we investigate various tandem-based speaker independent systems as well as maximum a posteriori adapted speaker dependent systems. Results on the UAspeech isolated word task of disordered speech are very promising with our overall best system (using a combination of AMI and TED data) giving a correctness of 62.5%; an increase of 15% on previously best published results based on conventional model adaptation. We show that the relative benefit of using OOD data varies considerably from speaker to speaker and is only loosely correlated with the severity of a speaker's impairments.
引用
收藏
页码:3609 / 3612
页数:4
相关论文
共 50 条
  • [31] Transfer learning through perturbation-based in-domain spectrogram augmentation for adult speech recognition
    Kadyan, Virender
    Bawa, Puneet
    [J]. Neural Computing and Applications, 2022, 34 (23): : 21015 - 21033
  • [32] Combining the Predictions of Out-of-Domain Classifiers Using Etcetera Abduction
    Gordon, Andrew S.
    Feng, Andrew
    [J]. 2024 58TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS, CISS, 2024,
  • [33] A simple baseline for domain generalization of action recognition and a realistic out-of-domain scenario
    Kim, Hyungmin
    Jeon, Hobeum
    Kim, Dohyung
    Kim, Jaehong
    [J]. 2023 20TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS, UR, 2023, : 515 - 520
  • [34] Transfer learning through perturbation-based in-domain spectrogram augmentation for adult speech recognition
    Kadyan, Virender
    Bawa, Puneet
    [J]. NEURAL COMPUTING & APPLICATIONS, 2022, 34 (23): : 21015 - 21033
  • [35] PERSONALIZED AUTOMATIC SPEECH RECOGNITION TRAINED ON SMALL DISORDERED SPEECH DATASETS
    Tobin, Jimmy
    Tomanek, Katrin
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6637 - 6641
  • [36] Automatic Speech Recognition for Kreol Morisien: A Case Study for the Health Domain
    Sahib-Kaudeer, Nuzhah Gooda
    Gobin-Rahimbux, Baby
    Bahsu, Bibi Saamiyah
    Maghoo, Maryam Farheen Aasiyah
    [J]. SPEECH AND COMPUTER, SPECOM 2019, 2019, 11658 : 414 - 422
  • [37] Frequency domain microphone array calibration and beamforming for automatic speech recognition
    Hu, JS
    Cheng, CC
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2005, E88A (09): : 2401 - 2411
  • [38] Noise robust speech recognition by combining speech enhancement in the wavelet domain and Lin-log RASTA
    Yang Jie
    Wang Zhenli
    [J]. 2009 ISECS INTERNATIONAL COLLOQUIUM ON COMPUTING, COMMUNICATION, CONTROL, AND MANAGEMENT, VOL II, 2009, : 415 - +
  • [39] NORMALIZATION AND ADAPTATION OF SPEECH DATA FOR AUTOMATIC SPEECH RECOGNITION
    SCARR, RWA
    [J]. INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1970, 2 (01): : 41 - 59
  • [40] Exploiting Cross-Domain Visual Feature Generation for Disordered Speech Recognition
    Liu, Shansong
    Xie, Xurong
    Yu, Jianwei
    Hu, Shoukang
    Geng, Mengzhe
    Su, Rongfeng
    Zhang, Shi-Xiong
    Liu, Xunying
    Meng, Helen
    [J]. INTERSPEECH 2020, 2020, : 711 - 715