Dialect-Aware Modeling for End-to-End Japanese Dialect Speech Recognition

被引:0
|
作者
Imaizumi, Ryo [1 ]
Masumura, Ryo [2 ]
Shiota, Sayaka [1 ]
Kiya, Hitoshi [1 ]
机构
[1] Tokyo Metropolitan Univ, Tokyo, Japan
[2] NTT Corp, NTT Media Intelligence Labs, Yokosuka, Kanagawa, Japan
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present a novel model for building end-to-end Japanese-dialect automatic speech recognition (ASR) system. It is known that ASR systems modeling for the standard Japanese language is not suitable for recognizing Japanese dialects, which include accents and vocabulary different from standard Japanese. Therefore, we aim to produce dialect-specific end-to-end ASR systems for Japanese. Since it is difficult to collect a massive amount of speech-to-text paired data for each Japanese dialect, we utilize both dialect data and standard Japanese language data for constructing the dialect-specific end-to-end ASR systems. One primitive approach is a multi-condition modeling that simply merges the dialect data with the standard-language data. However, this simple multi-condition modeling causes inadequate dialect-specific characteristics to be captured because of a mismatch between the dialects and standard language. Thus, to produce reliable dialect-specific end-to-end ASR systems, we propose the dialect-aware modeling that utilizes dialect labels as auxiliary features. The main strength of the proposed method is that it effectively utilizes both dialect and standard-language data while capturing adequate dialect-specific characteristics. In our experiments using a home-made database of Japanese dialects, the proposed dialect-aware modeling out-performed the simple multi-condition modeling and achieved an error reduction of 19.2%.
引用
收藏
页码:297 / 301
页数:5
相关论文
共 50 条
  • [1] Dialect-aware Semi-supervised Learning for End-to-End Multi-dialect Speech Recognition
    Shiota, Sayaka
    Imaizumi, Ryo
    Masumura, Ryo
    Kiya, Hitoshi
    [J]. PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 240 - 244
  • [2] End-to-end Japanese Multi-dialect Speech Recognition and Dialect Identification with Multi-task Learning
    Imaizumi, Ryo
    Masumura, Ryo
    Shiota, Sayaka
    Kiya, Hitoshi
    [J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2022, 11 (01)
  • [3] Domain Expansion for End-to-End Speech Recognition: Applications for Accent/Dialect Speech
    Ghorbani, Shahram
    Hansen, John H. L.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 762 - 774
  • [4] End-to-end Speech Synthesis for Tibetan Lhasa Dialect
    Luo, Lisai
    Li, Guanyu
    Gong, Chunwei
    Ding, Hailan
    [J]. 2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
  • [5] Multi-Task End-to-End Model for Telugu Dialect and Speech Recognition
    Yadavalli, Aditya
    Mirishkar, Ganesh S.
    Vuppala, Anil Kumar
    [J]. INTERSPEECH 2022, 2022, : 1387 - 1391
  • [6] A Streaming End-to-End Speech Recognition Approach Based on WeNet for Tibetan Amdo Dialect
    Wang, Chao
    Wen, Yao
    Lhamo, Phurba
    Tashi, Nyima
    [J]. 2022 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND NATURAL LANGUAGE PROCESSING, MLNLP 2022, 2022, : 317 - 322
  • [7] End-to-end Tibetan Ando dialect speech recognition based on hybrid CTC/attention architecture
    Sun, Jingwen
    Zhou, Gang
    Yang, Hongwu
    Wang, Man
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 628 - 632
  • [8] Toward an End-to-End Voice to Sign Recognition for Dialect Moroccan Language
    Allak, Anass
    Benelallam, Imade
    Habbouza, Hamdi
    Amallah, Mohamed
    [J]. ADVANCED TECHNOLOGIES FOR HUMANITY, 2022, 110 : 253 - 262
  • [9] An End-to-End Dialect Identification System with Transfer Learning from a Multilingual Automatic Speech Recognition Model
    Wang, Ding
    Ye, Shuaishuai
    Hu, Xinhui
    Li, Sheng
    Xu, Xinkang
    [J]. INTERSPEECH 2021, 2021, : 3266 - 3270
  • [10] DIALOG-CONTEXT AWARE END-TO-END SPEECH RECOGNITION
    Kim, Suyoun
    Metze, Florian
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 434 - 440