A Fast Adaptation Approach for Enhanced Automatic Recognition of Children's Speech with Mismatched Acoustic Models

被引:4
|
作者
Shahnawazuddin, S. [1 ]
Sinha, Rohit [1 ]
机构
[1] Indian Inst Technol Guwahati, Dept Elect & Elect Engn, Gauhati 781039, India
关键词
Children's speech recognition; Acoustic mismatch; Low-rank feature projection; Fast adaptation; SPEAKER ADAPTATION; BODY-SIZE;
D O I
10.1007/s00034-017-0586-6
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This study explores issues in automatic speech recognition (ASR) of children's speech on acoustic models trained using adults' speech. For acoustic modeling in ASR, the employed front-end features capture the characteristics of the vocal filter while smoothing out those of the source (excitation). Adults' and children's speech differ significantly due to large deviation in the acoustic correlates such as pitch, formants, speaking rate, etc. In the context of children's speech recognition on mismatched acoustic models, the recognition rates remain highly degraded despite use of the vocal tract length normalization (VTLN) for addressing formant mismatch. For commonly used mel-filterbank-based cepstral features, earlier studies have shown that the acoustic mismatch is exacerbated by insufficient smoothing of pitch harmonics for child speakers. To address this problem, a structured low-rank projection of the test features as well as that of the mean and the covariance parameters of the acoustic models was explored in an earlier work. In this paper, a low-latency adaptation scheme is presented for children's mismatched ASR. The presented fast adaptation approach exploits the earlier reported low-rank projection technique in order to reduce the computational cost. In the proposed approach, developmental data from the children's domain is partitioned into separate groups on the basis of their estimated VTLN warp factors. A set of adapted acoustic models is then created by combining the low-rank projection with the model space adaptation technique for each of the warp factors. Given the children's test utterance, first an appropriate pre-adapted model mean supervector is chosen based on its estimated warp factor. The chosen supervector is then optimally scaled. Consequently, only two parameters are required to be estimated, i.e., a warp factor and a model mean scaling factor. Even with such stringent constraints, the proposed adaptation technique results in a relative improvement of about over the VTLN included baseline.
引用
收藏
页码:1098 / 1115
页数:18
相关论文
共 50 条
  • [41] Graphical models and automatic speech recognition
    Bilmes, JA
    MATHEMATICAL FOUNDATIONS OF SPEECH AND LANGUAGE PROCESSING, 2004, 138 : 191 - 245
  • [42] Evidence of Phonological Processes in Automatic Recognition of Children's Speech
    Fringi, Eva
    Lehman, Jill Fain
    Russell, Martin
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1621 - 1624
  • [43] Layered Markov models: A new architectural approach to automatic speech recognition
    Penagarikano, M
    Bordel, G
    MACHINE LEARNING FOR SIGNAL PROCESSING XIV, 2004, : 305 - 314
  • [44] Automatic Speech Recognition System for Malay Speaking Children Automatic Speech Recognition system
    Rahman, Feisal Dani
    Mohamed, Noraini
    Mustafa, Mumtaz Begum
    Salim, Siti Salwah
    2014 THIRD ICT INTERNATIONAL STUDENT PROJECT CONFERENCE (ICT-ISPC), 2014, : 79 - 82
  • [45] Acoustic model adaptation using in-domain background models for dysarthric speech recognition
    Sharma, Harsh Vardhan
    Hasegawa-Johnson, Mark
    COMPUTER SPEECH AND LANGUAGE, 2013, 27 (06): : 1147 - 1162
  • [46] Simultaneous Adaptation of Acoustic and Language Models for Emotional Speech Recognition Using Tweet Data
    Kosaka, Tetsuo
    Saeki, Kazuya
    Aizawa, Yoshitaka
    Kato, Masaharu
    Nose, Takashi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (03) : 363 - 373
  • [47] Domain Adaptation Based on Mixture of Latent Words Language Models for Automatic Speech Recognition
    Masumura, Ryo
    Asami, Taichi
    Oba, Takanobu
    Masataki, Hirokazu
    Sakauchi, Sumitaka
    Ito, Akinori
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (06): : 1581 - 1590
  • [48] Improving Speech Recognition through Automatic Selection of Age Group - Specific Acoustic Models
    Haemaelaeinen, Annika
    Meinedo, Hugo
    Tjalve, Michael
    Pellegrini, Thomas
    Trancoso, Isabel
    Dias, Miguel Sales
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, 2014, 8775 : 12 - 23
  • [49] Unsupervised cross-adaptation approach for speech recognition by combined language model and acoustic model adaptation
    School of Science and Engineering, Yamagata University, Yonezawa, Japan
    APSIPA ASC - Asia-Pac. Signal Inf. Process. Assoc. Annu. Summit Conf., (943-946):
  • [50] Automatic recognition of disordered children's speech signal in dyadic interaction using deep learning models
    Kasture, Neha
    Jain, Pooja
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (16) : 49493 - 49513