A New Corpus of Elderly Japanese Speech for Acoustic Modeling, and a Preliminary Investigation of Dialect-Dependent Speech Recognition

被引:0
|
作者
Fukuda, Meiko [1 ]
Nishimura, Ryota [1 ]
Nishizaki, Hiromitsu [2 ]
Iribe, Yurie [3 ]
Kitaoka, Norihide [4 ]
机构
[1] Tokushima Univ, Dept Comp Sci, Tokushima, Japan
[2] Univ Yamanashi, Fac Engn, Grad Sch Interdisciplinary Res, Kofu, Yamanashi, Japan
[3] Aichi Prefectural Univ, Sch Informat Sci & Technol, Nagakute, Aichi, Japan
[4] Toyohashi Univ Technol, Dept Comp Sci & Engn, Toyohashi, Aichi, Japan
关键词
elderly; Japanese; corpus; speech recognition; adaptation; dialect; DEMENTIA;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We have constructed a new speech data corpus consisting of the utterances of 221 elderly Japanese people (average age: 79.2) with the aim of improving the accuracy of automatic speech recognition (ASR) for the elderly. ASR is a beneficial modality for people with impaired vision or limited hand movement, including the elderly. However, speech recognition systems using standard recognition models, especially acoustic models, have been unable to achieve satisfactory performance for the elderly. Thus, creating more accurate acoustic models of the speech of elderly users is essential for improving speech recognition for the elderly. Using our new corpus, which includes the speech of elderly people living in three regions of Japan, we conducted speech recognition experiments using a variety of DNN-HNN acoustic models. As training data for our acoustic models, we examined whether a standard adult Japanese speech corpus (JNAS), an elderly speech corpus (S-JNAS) or a spontaneous speech corpus (CSJ) was most suitable, and whether or not adaptation to the dialect of each region improved recognition results. We adapted each of our three acoustic models to all of our speech data, and then re-adapt them using speech from each region. Without adaptation, the best recognition results were obtained when using the S-JNAS trained acoustic models (total corpus: 21.85% Word Error Rate). However, after adaptation of our acoustic models to our entire corpus, the CSJ trained models achieved the lowest WERs (entire corpus: 17.42%). Moreover, after readaptation to each regional dialect, the CSJ trained acoustic model with adaptation to regional speech data showed tendencies of improved recognition rates. We plan to collect more utterances from all over Japan, so that our corpus can be used as a key resource for elderly speech recognition in Japanese. We also hope to achieve further improvement in recognition performance for elderly speech.
引用
收藏
页码:78 / 83
页数:6
相关论文
共 50 条
  • [41] Context-dependent acoustic modeling using graphemes for large vocabulary speech recognition
    Kanthak, S
    Ney, H
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 845 - 848
  • [42] Introduction to the special issue on new computational paradigms for acoustic modeling in speech recognition
    Russell, MJ
    Bilmes, JA
    COMPUTER SPEECH AND LANGUAGE, 2003, 17 (2-3): : 107 - 112
  • [43] Prosody dependent speech recognition on radio news corpus of American English
    Chen, K
    Hasegawa-Johnson, M
    Cohen, A
    Borys, S
    Kim, SS
    Cole, J
    Choi, JY
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01): : 232 - 245
  • [44] Context-dependent acoustic models for Chinese speech recognition
    Ma, B
    Huang, TY
    Xu, B
    Zhang, XJ
    Qu, F
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 455 - 458
  • [45] A new combined modeling of continuous speech recognition
    Han, ZB
    Jia, L
    Zhang, S
    Xu, B
    2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS, 2003, : 597 - 602
  • [46] Language modeling by stochastic dependency grammar for Japanese speech recognition
    Takada, Takahiro
    Hasegawa, Takemitsu
    Ogura, Hisakazu
    Tanaka, Masato
    Yamada, Hiroki
    Komuro, Hiroyuki
    Ishii, Yasushi
    Systems and Computers in Japan, 2001, 32 (12) : 10 - 15
  • [47] CYCLEGAN BANDWIDTH EXTENSION ACOUSTIC MODELING FOR AUTOMATIC SPEECH RECOGNITION
    Haws, David
    Cui, Xiaodong
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6780 - 6784
  • [48] Transfer learning for acoustic modeling of noise robust speech recognition
    Yi J.
    Tao J.
    Liu B.
    Wen Z.
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2018, 58 (01): : 55 - 60
  • [49] Automatic Speech Recognition for Uyghur through Multilingual Acoustic Modeling
    Abulimiti, Ayimunishagu
    Schultz, Tanja
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6444 - 6449
  • [50] A study on acoustic modeling for speech recognition of predominantly monosyllabic languages
    Maneenoi, E
    Ahkuputra, V
    Luksaneeyanawin, S
    Jitapunkul, S
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (05): : 1146 - 1163