A New Corpus of Elderly Japanese Speech for Acoustic Modeling, and a Preliminary Investigation of Dialect-Dependent Speech Recognition

被引:0
|
作者
Fukuda, Meiko [1 ]
Nishimura, Ryota [1 ]
Nishizaki, Hiromitsu [2 ]
Iribe, Yurie [3 ]
Kitaoka, Norihide [4 ]
机构
[1] Tokushima Univ, Dept Comp Sci, Tokushima, Japan
[2] Univ Yamanashi, Fac Engn, Grad Sch Interdisciplinary Res, Kofu, Yamanashi, Japan
[3] Aichi Prefectural Univ, Sch Informat Sci & Technol, Nagakute, Aichi, Japan
[4] Toyohashi Univ Technol, Dept Comp Sci & Engn, Toyohashi, Aichi, Japan
关键词
elderly; Japanese; corpus; speech recognition; adaptation; dialect; DEMENTIA;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We have constructed a new speech data corpus consisting of the utterances of 221 elderly Japanese people (average age: 79.2) with the aim of improving the accuracy of automatic speech recognition (ASR) for the elderly. ASR is a beneficial modality for people with impaired vision or limited hand movement, including the elderly. However, speech recognition systems using standard recognition models, especially acoustic models, have been unable to achieve satisfactory performance for the elderly. Thus, creating more accurate acoustic models of the speech of elderly users is essential for improving speech recognition for the elderly. Using our new corpus, which includes the speech of elderly people living in three regions of Japan, we conducted speech recognition experiments using a variety of DNN-HNN acoustic models. As training data for our acoustic models, we examined whether a standard adult Japanese speech corpus (JNAS), an elderly speech corpus (S-JNAS) or a spontaneous speech corpus (CSJ) was most suitable, and whether or not adaptation to the dialect of each region improved recognition results. We adapted each of our three acoustic models to all of our speech data, and then re-adapt them using speech from each region. Without adaptation, the best recognition results were obtained when using the S-JNAS trained acoustic models (total corpus: 21.85% Word Error Rate). However, after adaptation of our acoustic models to our entire corpus, the CSJ trained models achieved the lowest WERs (entire corpus: 17.42%). Moreover, after readaptation to each regional dialect, the CSJ trained acoustic model with adaptation to regional speech data showed tendencies of improved recognition rates. We plan to collect more utterances from all over Japan, so that our corpus can be used as a key resource for elderly speech recognition in Japanese. We also hope to achieve further improvement in recognition performance for elderly speech.
引用
收藏
页码:78 / 83
页数:6
相关论文
共 50 条
  • [21] FEDERATED ACOUSTIC MODELING FOR AUTOMATIC SPEECH RECOGNITION
    Cui, Xiaodong
    Lu, Songtao
    Kingsbury, Brian
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6748 - 6752
  • [22] Elderly Conversational Speech Corpus with Cognitive Impairment Test and Pilot Dementia Detection Experiment Using Acoustic Characteristics of Speech in Japanese Dialects
    Fukuda, Meiko
    Umezawa, Maina
    Nishimura, Ryota
    Iribe, Yurie
    Yamamoto, Kazumasa
    Kitaoka, Norihide
    2022 Language Resources and Evaluation Conference, LREC 2022, 2022, : 1016 - 1022
  • [23] Speech Recognition Performance of CJLC: Corpus of Japanese Lecture Contents
    Kogure, Satoru
    Nishizaki, Hiromitsu
    Tsuchiya, Masatoshi
    Yamamoto, Kazumasa
    Togashi, Shingo
    Nakagawa, Seiichi
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1554 - +
  • [24] Analysis and recognition of spontaneous speech using Corpus of Spontaneous Japanese
    Furui, S
    Nakamura, M
    Ichiba, T
    Iwano, K
    SPEECH COMMUNICATION, 2005, 47 (1-2) : 208 - 219
  • [25] Latent Perceptual Mapping: A New Acoustic Modeling Framework for Speech Recognition
    Sundaram, Shiva
    Bellegarda, Jerome R.
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 881 - 884
  • [26] A frame-based context-dependent acoustic modeling for speech recognition
    Terashima R.
    Zen H.
    Nankaku Y.
    Tokuda K.
    IEEJ Transactions on Electronics, Information and Systems, 2010, 130 (10) : 1856 - 1864+24
  • [27] Context dependent initial/final acoustic modeling for continuous Chinese speech recognition
    Li, Jing
    Zheng, Fang
    Zhang, Jiyong
    Wu, Wenhu
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2004, 44 (01): : 61 - 64
  • [28] Is Everything Fine, Grandma? Acoustic and Linguistic Modeling for Robust Elderly Speech Emotion Recognition
    Sogancioglu, Gizem
    Verkholyak, Oxana
    Kaya, Heysem
    Fedotov, Dmitrii
    Cadee, Tobias
    Salah, Albert Ali
    Karpov, Alexey
    INTERSPEECH 2020, 2020, : 2097 - 2101
  • [29] Emotional Intensity Estimation of a Japanese Speech Corpus Using Acoustic Features
    Kawase, Megumi
    Nakayama, Minoru
    2021 25TH INTERNATIONAL CONFERENCE INFORMATION VISUALISATION (IV): AI & VISUAL ANALYTICS & DATA SCIENCE, 2021, : 148 - 153
  • [30] Towards Building an Emotional Speech Corpus of Algerian Dialect: Criteria and Preliminary Assessment Results
    Ykhlef, Fay
    Derbal, A.
    Benzaba, W.
    Boutaleb, R.
    Bouchaffra, D.
    Meraoubi, H.
    Ykhlef, Far
    2019 INTERNATIONAL CONFERENCE ON ADVANCED ELECTRICAL ENGINEERING (ICAEE), 2019,