Enhancing the Recognition of Children's Speech on Acoustically Mismatched ASR System

被引：0

作者：

Shahnawazuddin, S. ^{[1
]}

Kathania, Hemant Kumar ^{[2
]}

Sinha, Rohit ^{[1
]}

机构：

[1] Indian Inst Technol Guwahati, Dept Elect & Elect Engn, Gauhati 781039, India

[2] Natl Inst Technol Sikkim, Dept Elect & Commun Engn, Sikkim 737139, India

来源：

TENCON 2015 - 2015 IEEE REGION 10 CONFERENCE | 2015年

关键词：

Children ASR; feature projection; adaptation;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The work presented in this paper explores the issues of recognizing children's speech using acoustic models trained on adults' speech data. In such conditions, on account of large acoustic mismatch between training and test data, a high degradation in the recognition performance is noted. In our earlier work, a binary weighting of cepstral features as well as of acoustic model parameters was explored to address the same. In this paper, a soft-weighting is proposed to overcome the information loss with simple binary weighting scheme. This is achieved through a low-rank projection learned using adults' training data. The so derived transform happens to emphasize the principal dimensions of acoustic variations in adults' speech. During testing, the transform maps children's test data to the space of the training data and thus suppresses the mismatched dimensions. The proposed scheme is also verified experimentally using a recognition system trained on adults' data only as well as another system trained using adults' and children's data pooled together. The effectiveness of acoustic model adaptation is also explored to further enhance the system performance. Combining SW with cluster model interpolation leads to a relative improvement of 14% over the baseline.

引用

页数：5

共 50 条

[31] Exploring the Role of Pitch-Adaptive Cepstral Features in Context of Children's Mismatched ASR
Sinha, Rohit
Shahnawazuddin, S.
Karthik, Patri Satya
2016 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM), 2016,
[32] Speech recognition based on acoustically derived segment units
Fukada, T
Bacchiani, M
Paliwal, KK
Sagisaka, Y
ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1077 - 1080
[33] Soft-Weighting Technique for Robust Children Speech Recognition under Mismatched Condition
Kathania, Hemant Kumar
Ghai, Shweta
Sinha, Rohit
2013 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2013,
[34] Isolated Word Automatic Speech Recognition (ASR) System using MFCC, DTW & KNN
Imtiaz, Muhammad Atif
Raja, Gulistan
2016 ASIA PACIFIC CONFERENCE ON MULTIMEDIA AND BROADCASTING (APMEDIACAST), 2016, : 106 - 110
[35] THE FAWAISPEECH SYSTEM FOR MULTI-CHANNEL SPEECH RECOGNITION IN ICMC-ASR CHALLENGE
Sun, Yujia
He, Jinxin
Zhang, Yi
Liang, Xiaoming
Wang, Ziyan
Fu, Zhen
Chen, Bo
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 19 - 20
[36] EMOTION RECOGNITION FROM SPEECH: PUTTING ASR IN THE LOOP
Schuller, Bjoern
Batliner, Anton
Steidl, Stefan
Seppi, Dino
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4585 - +
[37] Low-Cost Training of Speech Recognition System for Hindi ASR Challenge 2022
Zatvornitskiy, Alexander
SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 712 - 718
[38] A KALDI-DNN-based ASR system for Italian Experiments on Children Speech
Cosi, Piero
2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
[39] Adversarial Attacks on Automatic Speech Recognition (ASR): A Survey
Bhanushali, Amisha Rajnikant
Mun, Hyunjun
Yun, Joobeom
IEEE ACCESS, 2024, 12 : 88279 - 88302
[40] Amazigh speech recognition based on the Kaldi ASR toolkit
Barkani F.
Hamidi M.
Laaidi N.
Zealouk O.
Satori H.
Satori K.
International Journal of Information Technology, 2023, 15 (7) : 3533 - 3540

← 1 2 3 4 5 →