Acoustic models of the elderly for large-vocabulary continuous speech recognition

被引:22
|
作者
Baba, A [1 ]
Yoshizawa, S
Yamada, M
Lee, A
Shikano, K
机构
[1] Labs Image Informat Sci & Technol, Ikoma 6300101, Japan
[2] Matsushita Elect Works Ltd, Kadoma, Osaka 5718686, Japan
[3] Matsushita Elect Ind Co Ltd, Kyoto 6190237, Japan
[4] Nara Inst Sci & Technol, Grad Sch Informat Sci, Ikoma 6300101, Japan
关键词
elderly; large-vocabulary continuous speech recognition; acoustic model; speaker adaptation;
D O I
10.1002/ecjb.20101
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Widespread use of large-vocabulary continuous speech recognition systems has recently occurred, encouraging the application of speech recognition techniques to various problems. One of the factors that adversely affect the performance of speech recognition systems is a mismatch between the acoustic properties of the speech of the system user and the acoustic model. The speech of young Z or middle-aged adults is generally used in constructing the acoustic model. Thus, a mismatch occurs between the model and the acoustic properties of the speech of the elderly, which may degrade the recognition rate. In this study, a large-scale elderly speech database (200 sentences x 301 subjects) is used to train the acoustic model, and the resulting elderly acoustic model is evaluated by using a large-vocabulary continuous speech recognition system. In the experiments, the word recognition rate was improved by 3 to 5% compared to the recognition results of an acoustic model trained by young or middle-aged adult speech, namely, by the JNAS speech database (150 sentences x 260 subjects, average 28.6 years). It is also verified experimentally that the recognition rate is further improved in speaker adaptation to elderly speech by making use of an acoustic model trained by elderly speech. (C) 22004 Wiley Periodicals, Inc.
引用
收藏
页码:49 / 57
页数:9
相关论文
共 50 条
  • [31] Improving Discriminative Training for Robust Acoustic Models in Large Vocabulary Continuous Speech Recognition
    Pylkkonen, Janne
    Kurimo, Mikko
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1210 - 1213
  • [32] A COMMERCIAL LARGE-VOCABULARY DISCRETE SPEECH RECOGNITION SYSTEM - DRAGONDICTATE
    MANDEL, MA
    [J]. LANGUAGE AND SPEECH, 1992, 35 : 237 - 246
  • [33] Large-vocabulary spontaneous speech recognition using a corpus of lectures
    Nishimura, M
    Itoh, N
    [J]. ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 2003, 86 (08): : 52 - 60
  • [34] Building DNN acoustic models for large vocabulary speech recognition
    Maas, Andrew L.
    Qi, Peng
    Xie, Ziang
    Hannun, Awni Y.
    Lengerich, Christopher T.
    Jurafsky, Daniel
    Ng, Andrew Y.
    [J]. COMPUTER SPEECH AND LANGUAGE, 2017, 41 : 195 - 213
  • [35] Pre-Initialized Composition For Large-Vocabulary Speech Recognition
    Allauzen, Cyril
    Riley, Michael
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 666 - 670
  • [36] Compound words in large-vocabulary German speech recognition systems
    Berton, A
    Fetter, P
    RegelBrietzmann, P
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1165 - 1168
  • [37] Boosting HMM acoustic models in large vocabulary speech recognition
    Meyer, C
    Schramm, H
    [J]. SPEECH COMMUNICATION, 2006, 48 (05) : 532 - 548
  • [38] Profiling Large-Vocabulary Continuous Speech Recognition on Embedded Devices: A Hardware Resource Sensitivity Analysis
    Yu, Kai
    Rutenbar, Rob A.
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1903 - 1906
  • [39] Japanese large-vocabulary continuous-speech recognition using a business-newspaper corpus
    Matsuoka, T
    Ohtsuki, K
    Mori, T
    Yoshida, K
    Furui, S
    Shirai, K
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS, 1997, : 1803 - 1806
  • [40] Japanese large-vocabulary continuous-speech recognition using a business-newspaper corpus
    Matsuoka, T
    Ohtsuki, K
    Mori, T
    Furui, S
    Shirai, K
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 22 - 25