Enhancing the Recognition of Children's Speech on Acoustically Mismatched ASR System

被引：0

作者：

Shahnawazuddin, S. ^{[1
]}

Kathania, Hemant Kumar ^{[2
]}

Sinha, Rohit ^{[1
]}

机构：

[1] Indian Inst Technol Guwahati, Dept Elect & Elect Engn, Gauhati 781039, India

[2] Natl Inst Technol Sikkim, Dept Elect & Commun Engn, Sikkim 737139, India

来源：

TENCON 2015 - 2015 IEEE REGION 10 CONFERENCE | 2015年

关键词：

Children ASR; feature projection; adaptation;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The work presented in this paper explores the issues of recognizing children's speech using acoustic models trained on adults' speech data. In such conditions, on account of large acoustic mismatch between training and test data, a high degradation in the recognition performance is noted. In our earlier work, a binary weighting of cepstral features as well as of acoustic model parameters was explored to address the same. In this paper, a soft-weighting is proposed to overcome the information loss with simple binary weighting scheme. This is achieved through a low-rank projection learned using adults' training data. The so derived transform happens to emphasize the principal dimensions of acoustic variations in adults' speech. During testing, the transform maps children's test data to the space of the training data and thus suppresses the mismatched dimensions. The proposed scheme is also verified experimentally using a recognition system trained on adults' data only as well as another system trained using adults' and children's data pooled together. The effectiveness of acoustic model adaptation is also explored to further enhance the system performance. Combining SW with cluster model interpolation leads to a relative improvement of 14% over the baseline.

引用

页数：5

共 50 条

[21] ISI ASR System for the Low Resource Speech Recognition Challenge for Indian Languages
Billa, Jayadev
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3207 - 3211
[22] Noise robust in-domain children speech enhancement for automatic Punjabi recognition system under mismatched conditions
Bawa, Puneet
Kadyan, Virender
APPLIED ACOUSTICS, 2021, 175
[23] The FawAI ASR System for the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge
Sun, Yujia
Ge, Bing
Chen, Bo
Fu, Zhen
He, Jinxin
Gao, Hongwei
Wang, Xue
2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 512 - 516
[24] AISPEECH-SJTU ASR SYSTEM FOR THE ACCENTED ENGLISH SPEECH RECOGNITION CHALLENGE
Tan, Tian
Lu, Yizhou
Ma, Rao
Zhu, Sen
Guo, Jiaqi
Qian, Yanmin
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6413 - 6417
[25] Investigating recognition of children's speech
Giuliani, D
Gerosa, M
2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SPEECH II; INDUSTRY TECHNOLOGY TRACKS; DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS; NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003, : 137 - 140
[26] Robust recognition of children's speech
Potamianos, A
Narayan, S
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (06): : 603 - 616
[27] Speaker Recognition for Children's Speech
Safavi, Saeid
Najafian, Maryam
Hanani, Abualsoud
Russell, Martin
Jancovic, Peter
Carey, Michael
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1834 - 1837
[28] ASR emotional speech: Clarifying the issues and enhancing performance
Athanaselis, T
Bakamidis, S
Dologlou, I
Cowie, R
Douglas-Cowie, E
Cox, C
NEURAL NETWORKS, 2005, 18 (04) : 437 - 444
[29] Some Experiments on Context Mismatched Speech Recognition
Dey, Abhishek
Shahnawazuddin, S.
Sinha, Rohit
2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018), 2018, : 247 - 251
[30] Improving children's mismatched ASR using structured low-rank feature projection
Shahnawazuddin, S.
Kathania, Hemant K.
Dey, Abhishek
Sinha, Rohit
SPEECH COMMUNICATION, 2018, 105 : 103 - 113

← 1 2 3 4 5 →