A New Corpus of Elderly Japanese Speech for Acoustic Modeling, and a Preliminary Investigation of Dialect-Dependent Speech Recognition

被引:0
|
作者
Fukuda, Meiko [1 ]
Nishimura, Ryota [1 ]
Nishizaki, Hiromitsu [2 ]
Iribe, Yurie [3 ]
Kitaoka, Norihide [4 ]
机构
[1] Tokushima Univ, Dept Comp Sci, Tokushima, Japan
[2] Univ Yamanashi, Fac Engn, Grad Sch Interdisciplinary Res, Kofu, Yamanashi, Japan
[3] Aichi Prefectural Univ, Sch Informat Sci & Technol, Nagakute, Aichi, Japan
[4] Toyohashi Univ Technol, Dept Comp Sci & Engn, Toyohashi, Aichi, Japan
关键词
elderly; Japanese; corpus; speech recognition; adaptation; dialect; DEMENTIA;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We have constructed a new speech data corpus consisting of the utterances of 221 elderly Japanese people (average age: 79.2) with the aim of improving the accuracy of automatic speech recognition (ASR) for the elderly. ASR is a beneficial modality for people with impaired vision or limited hand movement, including the elderly. However, speech recognition systems using standard recognition models, especially acoustic models, have been unable to achieve satisfactory performance for the elderly. Thus, creating more accurate acoustic models of the speech of elderly users is essential for improving speech recognition for the elderly. Using our new corpus, which includes the speech of elderly people living in three regions of Japan, we conducted speech recognition experiments using a variety of DNN-HNN acoustic models. As training data for our acoustic models, we examined whether a standard adult Japanese speech corpus (JNAS), an elderly speech corpus (S-JNAS) or a spontaneous speech corpus (CSJ) was most suitable, and whether or not adaptation to the dialect of each region improved recognition results. We adapted each of our three acoustic models to all of our speech data, and then re-adapt them using speech from each region. Without adaptation, the best recognition results were obtained when using the S-JNAS trained acoustic models (total corpus: 21.85% Word Error Rate). However, after adaptation of our acoustic models to our entire corpus, the CSJ trained models achieved the lowest WERs (entire corpus: 17.42%). Moreover, after readaptation to each regional dialect, the CSJ trained acoustic model with adaptation to regional speech data showed tendencies of improved recognition rates. We plan to collect more utterances from all over Japan, so that our corpus can be used as a key resource for elderly speech recognition in Japanese. We also hope to achieve further improvement in recognition performance for elderly speech.
引用
收藏
页码:78 / 83
页数:6
相关论文
共 50 条
  • [11] An Annotated Speech Corpus of Rare Dialect for Recognition-Take Dali Dialect as an Example
    Huang, Tian
    Yang, Dongqi
    Qin, Wanyun
    Zhang, Shubo
    Li, Binyang
    Li, Yan
    COGNITIVE COMPUTING, ICCC 2021, 2022, 12992 : 3 - 13
  • [12] JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research
    Itou, Katunobu
    Yamamoto, Mikio
    Takeda, Kazuya
    Takezawa, Toshiyuki
    Matsuoka, Tatsuo
    Kobayashi, Tetsunori
    Shikano, Kiyohiro
    Itahashi, Shuichi
    Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi), 1999, 20 (03): : 199 - 206
  • [13] Speech corpus recycling for acoustic cross-domain environments for automatic speech recognition
    Ichikawa, Osamu
    Rennie, Steven J.
    Fukuda, Takashi
    Willett, Daniel
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2016, 37 (02) : 55 - 65
  • [14] Automatic Speech Recognition in Sanskrit: A New Speech Corpus and Modelling Insights
    Adiga, Devaraja
    Kumar, Rishabh
    Krishna, Amrith
    Jyothi, Preethi
    Ramakrishnan, Ganesh
    Goyal, Pawan
    Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021, : 5039 - 5050
  • [15] Automatic Speech Recognition in Sanskrit: A New Speech Corpus and Modelling Insights
    Adiga, Devaraja
    Kumar, Rishabh
    Krishna, Amrith
    Jyothi, Preethi
    Ramakrishnan, Ganesh
    Goyal, Pawan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 5039 - 5050
  • [16] Constructing a Phonetic Transcribed Text Corpus for Southern Thai Dialect Speech Recognition
    Aunkaew, Sittichok
    Karnjanadecha, Montri
    Wutiwiwatchai, Chai
    PROCEEDINGS OF THE 2015 12TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE), 2015, : 69 - 73
  • [17] Elderly Conversational Speech Corpus with Cognitive Impairment Test and Pilot Dementia Detection Experiment Using Acoustic Characteristics of Speech in Japanese Dialects
    Fukuda, Meiko
    Umezawa, Maina
    Nishimura, Ryota
    Iribe, Yurie
    Yamamoto, Kazumasa
    Kitaoka, Norihide
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1016 - 1022
  • [18] Multidialectal Spanish acoustic modeling for speech recognition
    Caballero, Monica
    Moreno, Asuncion
    Nogueiras, Albino
    SPEECH COMMUNICATION, 2009, 51 (03) : 217 - 229
  • [19] Acoustic Modeling in Speech Recognition: A Systematic Review
    Bhatt, Shobha
    Jain, Anurag
    Dev, Amita
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (04) : 397 - 412
  • [20] Joint acoustic and language modeling for speech recognition
    Chien, Jen-Tzung
    Chueh, Chuang-Hua
    SPEECH COMMUNICATION, 2010, 52 (03) : 223 - 235