Aging speech recognition with speaker adaptation techniques: Study on medium vocabulary continuous Bengali speech

被引：4

作者：

Das, Biswajit ^{[1
]}

Mandal, Sandipan ^{[1
]}

Mitra, Pabitra ^{[1
]}

Basu, Anupam ^{[1
]}

机构：

[1] Indian Inst Technol, Dept Comp Sci & Engn, Kharagpur 721302, W Bengal, India

来源：

PATTERN RECOGNITION LETTERS | 2013年 / 34卷 / 03期

关键词：

Aging speech recognition; Vocal tract length normalization (VTLN); Maximum likelihood linear transform (MLLT); Maximum likelihood linear regression (MLLR); Maximum a posteriori (MAP); Maximum mutual information estimation (MMIE); VOCAL-TRACT; EXPECTATION MAXIMIZATION; NORMALIZATION; AGE;

D O I：

10.1016/j.patrec.2012.10.029

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The article describes the speech recognition system development in Bengali language for aging population with various adaptation techniques. Variability in acoustic characteristics among different speakers degrades speech recognition accuracy. In general, perceptual as well as acoustical variations exists among speakers, but variations are more pronounced between young and aged population. Deviation in voice source features between two age groups, affect the speech recognition performance. Existing automatic speech recognition algorithms demands large amount of training data with all variability to develop a robust speech recognition system. However, speaker normalization and adaptation techniques attempts to reduce inter-speaker or intra-speaker acoustic variability without having large amount of training data. Here, conventional acoustic model adaptation method e.g. vocal tract length normalization, maximum likelihood linear regression and/or maximum a posteriori are combined in the current study to improve recognition accuracy. Moreover, maximum mutual information estimation technique has been implemented in this study. (C) 2012 Elsevier B.V. All rights reserved.

引用

页码：335 / 343

页数：9

共 50 条

[41] Effect of aging on speech features and phoneme recognition: A study on Bengali voicing vowels
Das B.
Mandal S.
Mitra P.
Basu A.
Das, B. (biswajit.net@gmail.com), 1600, Kluwer Academic Publishers (16): : 19 - 31
[42] Probabilistic Latent Speaker Analysis for Large Vocabulary Speech Recognition
Su, Dan
Wu, Xihong
Chi, Huisheng
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1889 - 1892
[43] Probabilistic Latent Speaker Training for Large Vocabulary Speech Recognition
Su, Dan
Wu, Xihong
Chi, Huisheng
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1225 - 1228
[44] Parallel Scalability in Speech Recognition Inference engines in large vocabulary continuous speech recognition
You, Kisun
Chong, Jike
Yi, Youngmin
Gonina, Ekaterina
Hughes, Christopher J.
Chen, Yen-Kuang
Sung, Wonyong
Keutzer, Kurt
IEEE SIGNAL PROCESSING MAGAZINE, 2009, 26 (06) : 124 - 135
[45] Eigen-Mllrs applied to unsupervised speaker enrollment for large vocabulary continuous speech recognition
Aubert, XL
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 349 - 352
[46] Probabilistic Speaker-Class based Acoustic Modeling for Large Vocabulary Continuous Speech Recognition
Li, Xiangang
Su, Dan
Pang, Zaihu
Wu, Xihong
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1218 - 1221
[47] Continuous speech recognition using an on-line speaker adaptation method based on automatic speaker clustering
Zhang, W
Nakagawa, S
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2003, E86D (03) : 464 - 473
[48] Bengali Speech Emotion Recognition
Mohanta, Abhijit
Sharma, Uzzal
PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 2812 - 2814
[49] A Combined Speaker Adaptation Method for Mandarin Speech Recognition
徐向华
朱杰
JournalofShanghaiJiaotongUniversity, 2004, (04) : 21 - 24
[50] SPEAKER ADAPTATION USING SPECTRAL INTERPOLATION FOR SPEECH RECOGNITION
SHINODA, K
ISO, KI
WATANABE, T
ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 1994, 77 (10): : 1 - 11

← 1 2 3 4 5 →