Dialect-Aware Modeling for End-to-End Japanese Dialect Speech Recognition

被引:0
|
作者
Imaizumi, Ryo [1 ]
Masumura, Ryo [2 ]
Shiota, Sayaka [1 ]
Kiya, Hitoshi [1 ]
机构
[1] Tokyo Metropolitan Univ, Tokyo, Japan
[2] NTT Corp, NTT Media Intelligence Labs, Yokosuka, Kanagawa, Japan
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present a novel model for building end-to-end Japanese-dialect automatic speech recognition (ASR) system. It is known that ASR systems modeling for the standard Japanese language is not suitable for recognizing Japanese dialects, which include accents and vocabulary different from standard Japanese. Therefore, we aim to produce dialect-specific end-to-end ASR systems for Japanese. Since it is difficult to collect a massive amount of speech-to-text paired data for each Japanese dialect, we utilize both dialect data and standard Japanese language data for constructing the dialect-specific end-to-end ASR systems. One primitive approach is a multi-condition modeling that simply merges the dialect data with the standard-language data. However, this simple multi-condition modeling causes inadequate dialect-specific characteristics to be captured because of a mismatch between the dialects and standard language. Thus, to produce reliable dialect-specific end-to-end ASR systems, we propose the dialect-aware modeling that utilizes dialect labels as auxiliary features. The main strength of the proposed method is that it effectively utilizes both dialect and standard-language data while capturing adequate dialect-specific characteristics. In our experiments using a home-made database of Japanese dialects, the proposed dialect-aware modeling out-performed the simple multi-condition modeling and achieved an error reduction of 19.2%.
引用
收藏
页码:297 / 301
页数:5
相关论文
共 50 条
  • [11] DIALOG-CONTEXT AWARE END-TO-END SPEECH RECOGNITION
    Kim, Suyoun
    Metze, Florian
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 434 - 440
  • [12] End-to-end Jordanian dialect speech-to-text self-supervised learning framework
    Safieh, Ali A.
    Abu Alhaol, Ibrahim
    Ghnemat, Rawan
    [J]. FRONTIERS IN ROBOTICS AND AI, 2022, 9
  • [13] PHOEBE: PRONUNCIATION-AWARE CONTEXTUALIZATION FOR END-TO-END SPEECH RECOGNITION
    Bruguier, Antoine
    Prabhavalkar, Rohit
    Pundak, Golan
    Sainath, Tara N.
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6171 - 6175
  • [14] IMPROVING RNN TRANSDUCER MODELING FOR END-TO-END SPEECH RECOGNITION
    Li, Jinyu
    Zhao, Rui
    Hu, Hu
    Gong, Yifan
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 114 - 121
  • [15] Focal Loss for End-to-end Short Utterances Chinese Dialect Identification
    Zhang, Qiuxian
    Yi, Jiangyan
    Tao, Jianhua
    Gu, Mingliang
    Ma, Yong
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 397 - 401
  • [16] END-TO-END MULTIMODAL SPEECH RECOGNITION
    Palaskar, Shruti
    Sanabria, Ramon
    Metze, Florian
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5774 - 5778
  • [17] End-to-End Speech Recognition in Russian
    Markovnikov, Nikita
    Kipyatkova, Irina
    Lyakso, Elena
    [J]. SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 377 - 386
  • [18] Overview of end-to-end speech recognition
    Wang, Song
    Li, Guanyu
    [J]. 2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
  • [19] END-TO-END ANCHORED SPEECH RECOGNITION
    Wang, Yiming
    Fan, Xing
    Chen, I-Fan
    Liu, Yuzong
    Chen, Tongfei
    Hoffmeister, Bjorn
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7090 - 7094
  • [20] END-TO-END AUDIOVISUAL SPEECH RECOGNITION
    Petridis, Stavros
    Stafylakis, Themos
    Ma, Pingchuan
    Cai, Feipeng
    Tzimiropoulos, Georgios
    Pantic, Maja
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6548 - 6552