Dialect-Aware Modeling for End-to-End Japanese Dialect Speech Recognition

被引:0
|
作者
Imaizumi, Ryo [1 ]
Masumura, Ryo [2 ]
Shiota, Sayaka [1 ]
Kiya, Hitoshi [1 ]
机构
[1] Tokyo Metropolitan Univ, Tokyo, Japan
[2] NTT Corp, NTT Media Intelligence Labs, Yokosuka, Kanagawa, Japan
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present a novel model for building end-to-end Japanese-dialect automatic speech recognition (ASR) system. It is known that ASR systems modeling for the standard Japanese language is not suitable for recognizing Japanese dialects, which include accents and vocabulary different from standard Japanese. Therefore, we aim to produce dialect-specific end-to-end ASR systems for Japanese. Since it is difficult to collect a massive amount of speech-to-text paired data for each Japanese dialect, we utilize both dialect data and standard Japanese language data for constructing the dialect-specific end-to-end ASR systems. One primitive approach is a multi-condition modeling that simply merges the dialect data with the standard-language data. However, this simple multi-condition modeling causes inadequate dialect-specific characteristics to be captured because of a mismatch between the dialects and standard language. Thus, to produce reliable dialect-specific end-to-end ASR systems, we propose the dialect-aware modeling that utilizes dialect labels as auxiliary features. The main strength of the proposed method is that it effectively utilizes both dialect and standard-language data while capturing adequate dialect-specific characteristics. In our experiments using a home-made database of Japanese dialects, the proposed dialect-aware modeling out-performed the simple multi-condition modeling and achieved an error reduction of 19.2%.
引用
收藏
页码:297 / 301
页数:5
相关论文
共 50 条
  • [41] An Overview of End-to-End Automatic Speech Recognition
    Wang, Dong
    Wang, Xiaodong
    Lv, Shaohe
    [J]. SYMMETRY-BASEL, 2019, 11 (08):
  • [42] Performance Monitoring for End-to-End Speech Recognition
    Li, Ruizhi
    Sell, Gregory
    Hermansky, Hynek
    [J]. INTERSPEECH 2019, 2019, : 2245 - 2249
  • [43] End-to-End Speech Recognition and Disfluency Removal
    Lou, Paria Jamshid
    Johnson, Mark
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2051 - 2061
  • [44] End-to-end Korean Digits Speech Recognition
    Roh, Jong-hyuk
    Cho, Kwantae
    Kim, Youngsam
    Cho, Sangrae
    [J]. 2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1137 - 1139
  • [45] End-to-End Speech Recognition in Agglutinative Languages
    Mamyrbayev, Orken
    Alimhan, Keylan
    Zhumazhanov, Bagashar
    Turdalykyzy, Tolganay
    Gusmanova, Farida
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2020), PT II, 2020, 12034 : 391 - 401
  • [46] End-to-end speech recognition modeling from de-identified data
    Flechl, Martin
    Yin, Shou-Chun
    Park, Junho
    Skala, Peter
    [J]. INTERSPEECH 2022, 2022, : 1382 - 1386
  • [47] Multi-Level Modeling Units for End-to-End Mandarin Speech Recognition
    Yang, Yuting
    Du, Binbin
    Li, Yuke
    [J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 175 - 179
  • [48] Acoustic Data-Driven Subword Modeling for End-to-End Speech Recognition
    Zhou, Wei
    Zeineldeen, Mohammad
    Zheng, Zuoyun
    Schlueter, Ralf
    Ney, Hermann
    [J]. INTERSPEECH 2021, 2021, : 2886 - 2890
  • [49] SPEECH ENHANCEMENT USING END-TO-END SPEECH RECOGNITION OBJECTIVES
    Subramanian, Aswin Shanmugam
    Wang, Xiaofei
    Baskar, Murali Karthick
    Watanabe, Shinji
    Taniguchi, Toru
    Tran, Dung
    Fujita, Yuya
    [J]. 2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2019, : 234 - 238
  • [50] An End-to-End Chinese and Japanese Bilingual Speech Recognition Systems with Shared Character Decomposition
    Li, Sheng
    Li, Jiyi
    Liu, Qianying
    Gong, Zhuo
    [J]. Communications in Computer and Information Science, 2023, 1793 CCIS : 493 - 503