Multi-Task End-to-End Model for Telugu Dialect and Speech Recognition

被引:3
|
作者
Yadavalli, Aditya [1 ]
Mirishkar, Ganesh S. [1 ]
Vuppala, Anil Kumar [1 ]
机构
[1] Int Inst Informat Technol, Speech Proc Lab, Hyderabad 500032, Telangana, India
来源
关键词
multi-dialect; speech recognition; sequence-to-sequence; dialect recognition;
D O I
10.21437/Interspeech.2022-10739
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Conventional Automatic Speech Recognition (ASR) systems are susceptible to dialect variations within a language, thereby adversely affecting the ASR. Therefore, the current practice is to use dialect-specific ASRs. However, dialect-specific information or data is hard to obtain making it difficult to build dialect-specific ASRs. Furthermore, it is cumbersome to maintain multiple dialect-specific ASR systems for each language. We build a unified multi-dialect End-to-End ASR that removes the need for a dialect recognition block and the need to maintain multiple dialect-specific ASRs for three Telugu regional dialects: Telangana, Coastal Andhra, and Rayalaseema. We find that pooling the data and training a multi-dialect ASR benefits the low-resource dialect the most - an improvement of over 9.71% in relative Word Error Rate (WER). Subsequently, we experiment with multi-task ASRs where the primary task is to transcribe the audio and the secondary task is to predict the dialect. We do this by adding a Dialect ID to the output targets. Such a model outperforms naive multi-dialect ASRs by up to 8.24% in relative WER. Additionally, we test this model on a dialect recognition task and find that it outperforms strong baselines by 6.14% in accuracy.
引用
收藏
页码:1387 / 1391
页数:5
相关论文
共 50 条
  • [1] End-to-end Japanese Multi-dialect Speech Recognition and Dialect Identification with Multi-task Learning
    Imaizumi, Ryo
    Masumura, Ryo
    Shiota, Sayaka
    Kiya, Hitoshi
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2022, 11 (01)
  • [2] Multi-task CTC Training with Auxiliary Feature Reconstruction for End-to-end Speech Recognition
    Kurata, Gakuto
    Audhkhasi, Kartik
    INTERSPEECH 2019, 2019, : 1636 - 1640
  • [3] Hybrid Multi-Task Learning for End-To-End Multimodal Emotion Recognition
    Chen, Junjie
    Li, Yongwei
    Zhao, Ziping
    Liu, Xuefei
    Wen, Zhengqi
    Tao, Jianhua
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1966 - 1971
  • [4] Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning
    Hou, Wenxin
    Dong, Yue
    Zhuang, Bairong
    Yang, Longfei
    Shi, Jiatong
    Shinozaki, Takahiro
    INTERSPEECH 2020, 2020, : 1037 - 1041
  • [5] JOINT CTC-ATTENTION BASED END-TO-END SPEECH RECOGNITION USING MULTI-TASK LEARNING
    Kim, Suyoun
    Hori, Takaaki
    Watanabe, Shinji
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4835 - 4839
  • [6] End-to-End Multi-Task Learning with Attention
    Liu, Shikun
    Johns, Edward
    Davison, Andrew J.
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1871 - 1880
  • [7] Dialect-Aware Modeling for End-to-End Japanese Dialect Speech Recognition
    Imaizumi, Ryo
    Masumura, Ryo
    Shiota, Sayaka
    Kiya, Hitoshi
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 297 - 301
  • [8] Age-Invariant Training for End-to-End Child Speech Recognition using Adversarial Multi-Task Learning
    Rumberg, Lars
    Ehlert, Hanna
    Luedtke, Ulrike
    Ostermann, Joern
    INTERSPEECH 2021, 2021, : 3850 - 3854
  • [9] Domain Expansion for End-to-End Speech Recognition: Applications for Accent/Dialect Speech
    Ghorbani, Shahram
    Hansen, John H. L.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 762 - 774
  • [10] Dialect-aware Semi-supervised Learning for End-to-End Multi-dialect Speech Recognition
    Shiota, Sayaka
    Imaizumi, Ryo
    Masumura, Ryo
    Kiya, Hitoshi
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 240 - 244