A Fast Adaptation Approach for Enhanced Automatic Recognition of Children's Speech with Mismatched Acoustic Models

被引:4
|
作者
Shahnawazuddin, S. [1 ]
Sinha, Rohit [1 ]
机构
[1] Indian Inst Technol Guwahati, Dept Elect & Elect Engn, Gauhati 781039, India
关键词
Children's speech recognition; Acoustic mismatch; Low-rank feature projection; Fast adaptation; SPEAKER ADAPTATION; BODY-SIZE;
D O I
10.1007/s00034-017-0586-6
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This study explores issues in automatic speech recognition (ASR) of children's speech on acoustic models trained using adults' speech. For acoustic modeling in ASR, the employed front-end features capture the characteristics of the vocal filter while smoothing out those of the source (excitation). Adults' and children's speech differ significantly due to large deviation in the acoustic correlates such as pitch, formants, speaking rate, etc. In the context of children's speech recognition on mismatched acoustic models, the recognition rates remain highly degraded despite use of the vocal tract length normalization (VTLN) for addressing formant mismatch. For commonly used mel-filterbank-based cepstral features, earlier studies have shown that the acoustic mismatch is exacerbated by insufficient smoothing of pitch harmonics for child speakers. To address this problem, a structured low-rank projection of the test features as well as that of the mean and the covariance parameters of the acoustic models was explored in an earlier work. In this paper, a low-latency adaptation scheme is presented for children's mismatched ASR. The presented fast adaptation approach exploits the earlier reported low-rank projection technique in order to reduce the computational cost. In the proposed approach, developmental data from the children's domain is partitioned into separate groups on the basis of their estimated VTLN warp factors. A set of adapted acoustic models is then created by combining the low-rank projection with the model space adaptation technique for each of the warp factors. Given the children's test utterance, first an appropriate pre-adapted model mean supervector is chosen based on its estimated warp factor. The chosen supervector is then optimally scaled. Consequently, only two parameters are required to be estimated, i.e., a warp factor and a model mean scaling factor. Even with such stringent constraints, the proposed adaptation technique results in a relative improvement of about over the VTLN included baseline.
引用
收藏
页码:1098 / 1115
页数:18
相关论文
共 50 条
  • [31] Hidden Markov models merging acoustic and articulatory information to automatic speech recognition
    Jacob, B
    Senac, C
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2313 - 2315
  • [32] Gender domain adaptation for automatic speech recognition
    Sokolov, Artem
    Savchenko, Anclrey V.
    2021 IEEE 19TH WORLD SYMPOSIUM ON APPLIED MACHINE INTELLIGENCE AND INFORMATICS (SAMI 2021), 2021, : 413 - 417
  • [33] PHONETIC SUBSPACE ADAPTATION FOR AUTOMATIC SPEECH RECOGNITION
    Ghalehjegh, Sina Hamidi
    Rose, Richard C.
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7937 - 7941
  • [34] Shrinkage Model Adaptation in Automatic Speech Recognition
    Li, Jinyu
    Tsao, Yu
    Lee, Chin-Hui
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1656 - +
  • [35] DOMAIN ADAPTATION FOR PARSING IN AUTOMATIC SPEECH RECOGNITION
    Marin, Alex
    Ostendorf, Mari
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [36] First Automatic Fongbe Continuous Speech Recognition System: Development of Acoustic Models and Language Models
    LAleye, Frejus A. A.
    Besacier, Laurent
    Ezin, Eugene C.
    Motamed, Cina
    PROCEEDINGS OF THE 2016 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (FEDCSIS), 2016, 8 : 477 - 482
  • [37] AUTOMATIC LEARNING - AN APPROACH TO THE ADAPTATION OF A SPEECH RECOGNITION SYSTEM TO ONE OR SEVERAL SPEAKERS
    PISTERBOURJOT, C
    HATON, JP
    SPEECH COMMUNICATION, 1987, 6 (01) : 43 - 54
  • [38] FEDERATED ACOUSTIC MODELING FOR AUTOMATIC SPEECH RECOGNITION
    Cui, Xiaodong
    Lu, Songtao
    Kingsbury, Brian
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6748 - 6752
  • [39] ON THE PATH TO THE AUTOMATIC RECOGNITION OF ACOUSTIC SPEECH SIGNALS
    UNTERBERGER
    ANGEWANDTE INFORMATIK, 1982, (09): : 445 - 450
  • [40] Interpolation of Acoustic Models for Speech Recognition
    Fraga-Silva, Thiago
    Gauvain, Jean-Luc
    Lamel, Lori
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3346 - 3350