A Fast Adaptation Approach for Enhanced Automatic Recognition of Children's Speech with Mismatched Acoustic Models

被引：4

作者：

Shahnawazuddin, S. ^{[1
]}

Sinha, Rohit ^{[1
]}

机构：

[1] Indian Inst Technol Guwahati, Dept Elect & Elect Engn, Gauhati 781039, India

来源：

CIRCUITS SYSTEMS AND SIGNAL PROCESSING | 2018年 / 37卷 / 03期

关键词：

Children's speech recognition; Acoustic mismatch; Low-rank feature projection; Fast adaptation; SPEAKER ADAPTATION; BODY-SIZE;

D O I：

10.1007/s00034-017-0586-6

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This study explores issues in automatic speech recognition (ASR) of children's speech on acoustic models trained using adults' speech. For acoustic modeling in ASR, the employed front-end features capture the characteristics of the vocal filter while smoothing out those of the source (excitation). Adults' and children's speech differ significantly due to large deviation in the acoustic correlates such as pitch, formants, speaking rate, etc. In the context of children's speech recognition on mismatched acoustic models, the recognition rates remain highly degraded despite use of the vocal tract length normalization (VTLN) for addressing formant mismatch. For commonly used mel-filterbank-based cepstral features, earlier studies have shown that the acoustic mismatch is exacerbated by insufficient smoothing of pitch harmonics for child speakers. To address this problem, a structured low-rank projection of the test features as well as that of the mean and the covariance parameters of the acoustic models was explored in an earlier work. In this paper, a low-latency adaptation scheme is presented for children's mismatched ASR. The presented fast adaptation approach exploits the earlier reported low-rank projection technique in order to reduce the computational cost. In the proposed approach, developmental data from the children's domain is partitioned into separate groups on the basis of their estimated VTLN warp factors. A set of adapted acoustic models is then created by combining the low-rank projection with the model space adaptation technique for each of the warp factors. Given the children's test utterance, first an appropriate pre-adapted model mean supervector is chosen based on its estimated warp factor. The chosen supervector is then optimally scaled. Consequently, only two parameters are required to be estimated, i.e., a warp factor and a model mean scaling factor. Even with such stringent constraints, the proposed adaptation technique results in a relative improvement of about over the VTLN included baseline.

引用

页码：1098 / 1115

页数：18

共 50 条

[1] A Fast Adaptation Approach for Enhanced Automatic Recognition of Children’s Speech with Mismatched Acoustic Models
S. Shahnawazuddin
Rohit Sinha
Circuits, Systems, and Signal Processing, 2018, 37 : 1098 - 1115
[2] Cross-language adaptation of acoustic models in automatic speech recognition
Univ of Pretoria, Pretoria, South Africa
IEEE AFRICON Conf, (181-184):
[3] Low-memory Fast On-line Adaptation for Acoustically Mismatched Children's Speech Recognition
Shahnawazuddin, S.
Sinha, Rohit
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1630 - 1634
[4] Acoustic variability and automatic recognition of children's speech
Gerosa, Matteo
Giuliani, Diego
Brugnara, Fabio
SPEECH COMMUNICATION, 2007, 49 (10-11) : 847 - 860
[5] Acoustic Analysis and Automatic Recognition of Spontaneous Children's Speech
Gerosa, M.
Giuliani, D.
Narayanan, S.
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1886 - +
[6] Enhancing Children's Speech Recognition under Mismatched Condition by Explicit Acoustic Normalization
Ghai, Shweta
Sinha, Rohit
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 522 - 525
[7] Exploring the Effect of Dialect Mismatched Language Models in Telugu Automatic Speech Recognition
Yadavalli, Aditya
Mirishkar, Ganesh S.
Vuppala, Anil Kumar
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2022, : 292 - 301
[8] Enhanced Automatic Speech Recognition with Non-acoustic Parameters
Sreekanth, N. S.
Narayanan, N. K.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL, NETWORKS, COMPUTING, AND SYSTEMS (ICSNCS 2016), VOL 1, 2017, 395 : 93 - 104
[9] Acoustic and Language Models Adaptation for Indonesian Spontaneous Speech Recognition
Lestari, Dessi Puji
Irfani, Angela
2015 2ND INTERNATIONAL CONFERENCE ON ADVANCED INFORMATICS: CONCEPTS, THEORY AND APPLICATIONS ICAICTA, 2015,
[10] Mismatched Training Data Enhancement for Automatic Recognition of Children's Speech using DNN-HMM
Qian, Mengjie
McLoughlin, Ian
Guo, Wu
Dai, Lirong
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,

← 1 2 3 4 5 →