Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition

被引:5
|
作者
Geng, Mengzhe [1 ]
Liu, Shansong [1 ]
Yu, Jianwei [1 ]
Xie, Xurong [2 ]
Hu, Shoukang [1 ]
Ye, Zi [1 ]
Jin, Zengrui [1 ]
Liu, Xunying [1 ]
Meng, Helen [1 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Beijing, Peoples R China
来源
关键词
Speech Disorders; Speech Recognition; Speaker Adaptation; Speech Assessment; Subspace-based Learning; DYSARTHRIA;
D O I
10.21437/Interspeech.2021-60
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Automatic recognition of disordered speech remains a highly challenging task to date. Sources of variability commonly found in normal speech including accent, age or gender, when further compounded with the underlying causes of speech impairment and varying severity levels, create large diversity among speakers. To this end, speaker adaptation techniques play a vital role in current speech recognition systems. Motivated by the spectro-temporal level differences between disordered and normal speech that systematically manifest in articulatory imprecision, decreased volume and clarity, slower speaking rates and increased dysfluencies, novel spectro-temporal subspace basis embedding deep features derived by SVD decomposition of speech spectrum are proposed to facilitate both accurate speech intelligibility assessment and auxiliary feature based speaker adaptation of state-of-the-art hybrid DNN and end-to-end disordered speech recognition systems. Experiments conducted on the UASpeech corpus suggest the proposed spectro-temporal deep feature adapted systems consistently outperformed baseline i-Vector adaptation by up to 2.63% absolute (8.6% relative) reduction in word error rate (WER) with or without data augmentation. Learning hidden unit contribution (LHUC) based speaker adaptation was further applied. The final speaker adapted system using the proposed spectral basis embedding features gave an overall WER of 25.6% on the UASpeech test set of 16 dysarthric speakers.
引用
收藏
页码:4793 / 4797
页数:5
相关论文
共 50 条
  • [31] Localized spectro-temporal cepstral analysis of speech
    Bouvrie, Jake
    Ezzat, Tony
    Poggio, Tomaso
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4733 - 4736
  • [32] Feature Adaptation Using Linear Spectro-Temporal Transform for Robust Speech Recognition
    Duc Hoang Ha Nguyen
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (06) : 1006 - 1019
  • [33] A clustering based feature selection method in spectro-temporal domain for speech recognition
    Esfandian, Nafiseh
    Razzazi, Farbod
    Behrad, Alireza
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2012, 25 (06) : 1194 - 1202
  • [34] Robust Spectro-Temporal Speech Features with Model-Based Distribution Equalization
    Ngouoko, Samuel K. M.
    Heckmann, Martin
    Wrede, Britta
    [J]. 2013 14TH INTERNATIONAL WORKSHOP ON IMAGE ANALYSIS FOR MULTIMEDIA INTERACTIVE SERVICES (WIAMIS), 2013,
  • [35] Speaker sex effects on temporal and spectro-temporal measures of speech
    Herrmann, Frank
    Cunningham, Stuart P.
    Whiteside, Sandra P.
    [J]. JOURNAL OF THE INTERNATIONAL PHONETIC ASSOCIATION, 2014, 44 (01) : 59 - 74
  • [36] Analysis of Spectro-Temporal Modulation Representation for Deep-Fake Speech Detection
    Cheng, Haowei
    Mawalim, Candy Olivia
    Li, Kai
    Wang, Lijun
    Unoki, Masashi
    [J]. 2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1822 - 1829
  • [37] Novel Gammatone Filterbank Based Spectro-Temporal Features for Robust Phoneme Recognition
    Nagpal, Ankit
    Patil, Hemant A.
    [J]. PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2017, 2017, 10597 : 342 - 350
  • [38] Spectro-temporal features for environmental sound classification
    Thwe, Khine Zar
    Thaw, Mie Mie
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2019, 20 (02) : 179 - 189
  • [39] Spectro-Temporal Features for Howling Frequency Detection
    Lee, Jae-Won
    Choi, Seung Ho
    [J]. COMPUTER APPLICATIONS FOR WEB, HUMAN COMPUTER INTERACTION, SIGNAL AND IMAGE PROCESSING AND PATTERN RECOGNITION, 2012, 342 : 25 - +
  • [40] Nonnegative features of spectro-temporal sounds for classification
    Cho, YC
    Choi, SJ
    [J]. PATTERN RECOGNITION LETTERS, 2005, 26 (09) : 1327 - 1336