Multi-Task End-to-End Model for Telugu Dialect and Speech Recognition

被引：3

作者：

Yadavalli, Aditya ^{[1
]}

Mirishkar, Ganesh S. ^{[1
]}

Vuppala, Anil Kumar ^{[1
]}

机构：

[1] Int Inst Informat Technol, Speech Proc Lab, Hyderabad 500032, Telangana, India

来源：

INTERSPEECH 2022 | 2022年

关键词：

multi-dialect; speech recognition; sequence-to-sequence; dialect recognition;

D O I：

10.21437/Interspeech.2022-10739

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Conventional Automatic Speech Recognition (ASR) systems are susceptible to dialect variations within a language, thereby adversely affecting the ASR. Therefore, the current practice is to use dialect-specific ASRs. However, dialect-specific information or data is hard to obtain making it difficult to build dialect-specific ASRs. Furthermore, it is cumbersome to maintain multiple dialect-specific ASR systems for each language. We build a unified multi-dialect End-to-End ASR that removes the need for a dialect recognition block and the need to maintain multiple dialect-specific ASRs for three Telugu regional dialects: Telangana, Coastal Andhra, and Rayalaseema. We find that pooling the data and training a multi-dialect ASR benefits the low-resource dialect the most - an improvement of over 9.71% in relative Word Error Rate (WER). Subsequently, we experiment with multi-task ASRs where the primary task is to transcribe the audio and the secondary task is to predict the dialect. We do this by adding a Dialect ID to the output targets. Such a model outperforms naive multi-dialect ASRs by up to 8.24% in relative WER. Additionally, we test this model on a dialect recognition task and find that it outperforms strong baselines by 6.14% in accuracy.

引用

页码：1387 / 1391

页数：5

共 50 条

[21] Multi-task Learning with Attention for End-to-end Autonomous Driving
Ishihara, Keishi
Kanervisto, Anssi
Miura, Jun
Hautamaki, Ville
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 2896 - 2905
[22] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION
Settle, Shane
Le Roux, Jonathan
Hori, Takaaki
Watanabe, Shinji
Hershey, John R.
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4819 - 4823
[23] Multi-Stream End-to-End Speech Recognition
Li, Ruizhi
Wang, Xiaofei
Mallidi, Sri Harish
Watanabe, Shinji
Hori, Takaaki
Hermansky, Hynek
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (646-655) : 646 - 655
[24] An End-to-End Dialect Identification System with Transfer Learning from a Multilingual Automatic Speech Recognition Model
Wang, Ding
Ye, Shuaishuai
Hu, Xinhui
Li, Sheng
Xu, Xinkang
[J]. INTERSPEECH 2021, 2021, : 3266 - 3270
[25] Residual Language Model for End-to-end Speech Recognition
Tsunoo, Emiru
Kashiwagi, Yosuke
Narisetty, Chaitanya
Watanabe, Shinji
[J]. INTERSPEECH 2022, 2022, : 3899 - 3903
[26] Multi-task and multi-view training for end-to-end relation extraction
Zhang, Junchi
Zhang, Yue
Ji, Donghong
Liu, Mengchi
[J]. NEUROCOMPUTING, 2019, 364 : 245 - 253
[27] Hybrid end-to-end model for Kazakh speech recognition
Mamyrbayev O.Z.
Oralbekova D.O.
Alimhan K.
Nuranbayeva B.M.
[J]. International Journal of Speech Technology, 2023, 26 (02) : 261 - 270
[28] MULTILINGUAL SPEECH RECOGNITION WITH A SINGLE END-TO-END MODEL
Toshniwal, Shubham
Sainath, Tara N.
Weiss, Ron J.
Li, Bo
Moreno, Pedro
Weinstein, Eugene
Rao, Kanishka
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4904 - 4908
[29] A SPELLING CORRECTION MODEL FOR END-TO-END SPEECH RECOGNITION
Guo, Jinxi
Sainath, Tara N.
Weiss, Ron J.
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5651 - 5655
[30] End-to-End Multi-Task Learning for Lung Nodule Segmentation and Diagnosis
Chen, Wei
Wang, Qiuli
Yang, Dan
Zhang, Xiaohong
Liu, Chen
Li, Yucong
[J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 6710 - 6717

← 1 2 3 4 5 →