Multi-Task End-to-End Model for Telugu Dialect and Speech Recognition

被引:3
|
作者
Yadavalli, Aditya [1 ]
Mirishkar, Ganesh S. [1 ]
Vuppala, Anil Kumar [1 ]
机构
[1] Int Inst Informat Technol, Speech Proc Lab, Hyderabad 500032, Telangana, India
来源
关键词
multi-dialect; speech recognition; sequence-to-sequence; dialect recognition;
D O I
10.21437/Interspeech.2022-10739
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Conventional Automatic Speech Recognition (ASR) systems are susceptible to dialect variations within a language, thereby adversely affecting the ASR. Therefore, the current practice is to use dialect-specific ASRs. However, dialect-specific information or data is hard to obtain making it difficult to build dialect-specific ASRs. Furthermore, it is cumbersome to maintain multiple dialect-specific ASR systems for each language. We build a unified multi-dialect End-to-End ASR that removes the need for a dialect recognition block and the need to maintain multiple dialect-specific ASRs for three Telugu regional dialects: Telangana, Coastal Andhra, and Rayalaseema. We find that pooling the data and training a multi-dialect ASR benefits the low-resource dialect the most - an improvement of over 9.71% in relative Word Error Rate (WER). Subsequently, we experiment with multi-task ASRs where the primary task is to transcribe the audio and the secondary task is to predict the dialect. We do this by adding a Dialect ID to the output targets. Such a model outperforms naive multi-dialect ASRs by up to 8.24% in relative WER. Additionally, we test this model on a dialect recognition task and find that it outperforms strong baselines by 6.14% in accuracy.
引用
收藏
页码:1387 / 1391
页数:5
相关论文
共 50 条
  • [21] Multi-task Learning with Attention for End-to-end Autonomous Driving
    Ishihara, Keishi
    Kanervisto, Anssi
    Miura, Jun
    Hautamaki, Ville
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 2896 - 2905
  • [22] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION
    Settle, Shane
    Le Roux, Jonathan
    Hori, Takaaki
    Watanabe, Shinji
    Hershey, John R.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4819 - 4823
  • [23] Multi-Stream End-to-End Speech Recognition
    Li, Ruizhi
    Wang, Xiaofei
    Mallidi, Sri Harish
    Watanabe, Shinji
    Hori, Takaaki
    Hermansky, Hynek
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (646-655) : 646 - 655
  • [24] An End-to-End Dialect Identification System with Transfer Learning from a Multilingual Automatic Speech Recognition Model
    Wang, Ding
    Ye, Shuaishuai
    Hu, Xinhui
    Li, Sheng
    Xu, Xinkang
    [J]. INTERSPEECH 2021, 2021, : 3266 - 3270
  • [25] Residual Language Model for End-to-end Speech Recognition
    Tsunoo, Emiru
    Kashiwagi, Yosuke
    Narisetty, Chaitanya
    Watanabe, Shinji
    [J]. INTERSPEECH 2022, 2022, : 3899 - 3903
  • [26] Multi-task and multi-view training for end-to-end relation extraction
    Zhang, Junchi
    Zhang, Yue
    Ji, Donghong
    Liu, Mengchi
    [J]. NEUROCOMPUTING, 2019, 364 : 245 - 253
  • [27] Hybrid end-to-end model for Kazakh speech recognition
    Mamyrbayev O.Z.
    Oralbekova D.O.
    Alimhan K.
    Nuranbayeva B.M.
    [J]. International Journal of Speech Technology, 2023, 26 (02) : 261 - 270
  • [28] MULTILINGUAL SPEECH RECOGNITION WITH A SINGLE END-TO-END MODEL
    Toshniwal, Shubham
    Sainath, Tara N.
    Weiss, Ron J.
    Li, Bo
    Moreno, Pedro
    Weinstein, Eugene
    Rao, Kanishka
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4904 - 4908
  • [29] A SPELLING CORRECTION MODEL FOR END-TO-END SPEECH RECOGNITION
    Guo, Jinxi
    Sainath, Tara N.
    Weiss, Ron J.
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5651 - 5655
  • [30] End-to-End Multi-Task Learning for Lung Nodule Segmentation and Diagnosis
    Chen, Wei
    Wang, Qiuli
    Yang, Dan
    Zhang, Xiaohong
    Liu, Chen
    Li, Yucong
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 6710 - 6717