Multi-Task End-to-End Model for Telugu Dialect and Speech Recognition

被引:3
|
作者
Yadavalli, Aditya [1 ]
Mirishkar, Ganesh S. [1 ]
Vuppala, Anil Kumar [1 ]
机构
[1] Int Inst Informat Technol, Speech Proc Lab, Hyderabad 500032, Telangana, India
来源
关键词
multi-dialect; speech recognition; sequence-to-sequence; dialect recognition;
D O I
10.21437/Interspeech.2022-10739
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Conventional Automatic Speech Recognition (ASR) systems are susceptible to dialect variations within a language, thereby adversely affecting the ASR. Therefore, the current practice is to use dialect-specific ASRs. However, dialect-specific information or data is hard to obtain making it difficult to build dialect-specific ASRs. Furthermore, it is cumbersome to maintain multiple dialect-specific ASR systems for each language. We build a unified multi-dialect End-to-End ASR that removes the need for a dialect recognition block and the need to maintain multiple dialect-specific ASRs for three Telugu regional dialects: Telangana, Coastal Andhra, and Rayalaseema. We find that pooling the data and training a multi-dialect ASR benefits the low-resource dialect the most - an improvement of over 9.71% in relative Word Error Rate (WER). Subsequently, we experiment with multi-task ASRs where the primary task is to transcribe the audio and the secondary task is to predict the dialect. We do this by adding a Dialect ID to the output targets. Such a model outperforms naive multi-dialect ASRs by up to 8.24% in relative WER. Additionally, we test this model on a dialect recognition task and find that it outperforms strong baselines by 6.14% in accuracy.
引用
收藏
页码:1387 / 1391
页数:5
相关论文
共 50 条
  • [1] End-to-end Japanese Multi-dialect Speech Recognition and Dialect Identification with Multi-task Learning
    Imaizumi, Ryo
    Masumura, Ryo
    Shiota, Sayaka
    Kiya, Hitoshi
    [J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2022, 11 (01)
  • [2] Multi-task CTC Training with Auxiliary Feature Reconstruction for End-to-end Speech Recognition
    Kurata, Gakuto
    Audhkhasi, Kartik
    [J]. INTERSPEECH 2019, 2019, : 1636 - 1640
  • [3] Hybrid Multi-Task Learning for End-To-End Multimodal Emotion Recognition
    Chen, Junjie
    Li, Yongwei
    Zhao, Ziping
    Liu, Xuefei
    Wen, Zhengqi
    Tao, Jianhua
    [J]. 2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1966 - 1971
  • [4] Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning
    Hou, Wenxin
    Dong, Yue
    Zhuang, Bairong
    Yang, Longfei
    Shi, Jiatong
    Shinozaki, Takahiro
    [J]. INTERSPEECH 2020, 2020, : 1037 - 1041
  • [5] JOINT CTC-ATTENTION BASED END-TO-END SPEECH RECOGNITION USING MULTI-TASK LEARNING
    Kim, Suyoun
    Hori, Takaaki
    Watanabe, Shinji
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4835 - 4839
  • [6] End-to-End Multi-Task Learning with Attention
    Liu, Shikun
    Johns, Edward
    Davison, Andrew J.
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1871 - 1880
  • [7] Dialect-Aware Modeling for End-to-End Japanese Dialect Speech Recognition
    Imaizumi, Ryo
    Masumura, Ryo
    Shiota, Sayaka
    Kiya, Hitoshi
    [J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 297 - 301
  • [8] Age-Invariant Training for End-to-End Child Speech Recognition using Adversarial Multi-Task Learning
    Rumberg, Lars
    Ehlert, Hanna
    Luedtke, Ulrike
    Ostermann, Joern
    [J]. INTERSPEECH 2021, 2021, : 3850 - 3854
  • [9] Domain Expansion for End-to-End Speech Recognition: Applications for Accent/Dialect Speech
    Ghorbani, Shahram
    Hansen, John H. L.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 762 - 774
  • [10] Dialect-aware Semi-supervised Learning for End-to-End Multi-dialect Speech Recognition
    Shiota, Sayaka
    Imaizumi, Ryo
    Masumura, Ryo
    Kiya, Hitoshi
    [J]. PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 240 - 244