Aging speech recognition with speaker adaptation techniques: Study on medium vocabulary continuous Bengali speech

被引:4
|
作者
Das, Biswajit [1 ]
Mandal, Sandipan [1 ]
Mitra, Pabitra [1 ]
Basu, Anupam [1 ]
机构
[1] Indian Inst Technol, Dept Comp Sci & Engn, Kharagpur 721302, W Bengal, India
关键词
Aging speech recognition; Vocal tract length normalization (VTLN); Maximum likelihood linear transform (MLLT); Maximum likelihood linear regression (MLLR); Maximum a posteriori (MAP); Maximum mutual information estimation (MMIE); VOCAL-TRACT; EXPECTATION MAXIMIZATION; NORMALIZATION; AGE;
D O I
10.1016/j.patrec.2012.10.029
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The article describes the speech recognition system development in Bengali language for aging population with various adaptation techniques. Variability in acoustic characteristics among different speakers degrades speech recognition accuracy. In general, perceptual as well as acoustical variations exists among speakers, but variations are more pronounced between young and aged population. Deviation in voice source features between two age groups, affect the speech recognition performance. Existing automatic speech recognition algorithms demands large amount of training data with all variability to develop a robust speech recognition system. However, speaker normalization and adaptation techniques attempts to reduce inter-speaker or intra-speaker acoustic variability without having large amount of training data. Here, conventional acoustic model adaptation method e.g. vocal tract length normalization, maximum likelihood linear regression and/or maximum a posteriori are combined in the current study to improve recognition accuracy. Moreover, maximum mutual information estimation technique has been implemented in this study. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:335 / 343
页数:9
相关论文
共 50 条
  • [41] Effect of aging on speech features and phoneme recognition: A study on Bengali voicing vowels
    Das B.
    Mandal S.
    Mitra P.
    Basu A.
    Das, B. (biswajit.net@gmail.com), 1600, Kluwer Academic Publishers (16): : 19 - 31
  • [42] Probabilistic Latent Speaker Analysis for Large Vocabulary Speech Recognition
    Su, Dan
    Wu, Xihong
    Chi, Huisheng
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1889 - 1892
  • [43] Probabilistic Latent Speaker Training for Large Vocabulary Speech Recognition
    Su, Dan
    Wu, Xihong
    Chi, Huisheng
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1225 - 1228
  • [44] Parallel Scalability in Speech Recognition Inference engines in large vocabulary continuous speech recognition
    You, Kisun
    Chong, Jike
    Yi, Youngmin
    Gonina, Ekaterina
    Hughes, Christopher J.
    Chen, Yen-Kuang
    Sung, Wonyong
    Keutzer, Kurt
    IEEE SIGNAL PROCESSING MAGAZINE, 2009, 26 (06) : 124 - 135
  • [45] Eigen-Mllrs applied to unsupervised speaker enrollment for large vocabulary continuous speech recognition
    Aubert, XL
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 349 - 352
  • [46] Probabilistic Speaker-Class based Acoustic Modeling for Large Vocabulary Continuous Speech Recognition
    Li, Xiangang
    Su, Dan
    Pang, Zaihu
    Wu, Xihong
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1218 - 1221
  • [47] Continuous speech recognition using an on-line speaker adaptation method based on automatic speaker clustering
    Zhang, W
    Nakagawa, S
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2003, E86D (03) : 464 - 473
  • [48] Bengali Speech Emotion Recognition
    Mohanta, Abhijit
    Sharma, Uzzal
    PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 2812 - 2814
  • [49] A Combined Speaker Adaptation Method for Mandarin Speech Recognition
    徐向华
    朱杰
    JournalofShanghaiJiaotongUniversity, 2004, (04) : 21 - 24
  • [50] SPEAKER ADAPTATION USING SPECTRAL INTERPOLATION FOR SPEECH RECOGNITION
    SHINODA, K
    ISO, KI
    WATANABE, T
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 1994, 77 (10): : 1 - 11