Enhancing Large Vocabulary Continuous Speech Recognition System for Urdu-English Conversational Code-Switched Speech

被引：0

作者：

Farooq, Muhammad Umar ^{[1
]}

Adeeba, Farah ^{[1
]}

Hussain, Sarmad ^{[1
]}

Rauf, Sahar ^{[1
]}

Khalid, Maryam ^{[1
]}

机构：

[1] Univ Engn & Technol, Ctr Language Engn, Al Khawarizmi Inst Comp Sci, Lahore, Pakistan

来源：

PROCEEDINGS OF 2020 23RD CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (ORIENTAL-COCOSDA 2020) | 2020年

关键词：

Urdu-English code-switching; Urdu speech recognition; under-resourced language;

D O I：

10.1109/o-cocosda50338.2020.9295036

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents first step towards Large Vocabulary Continuous Speech Recognition (LVCSR) system for Urdu-English code-switched conversational speech. Urdu is the national language and lingua franca of Pakistan, with 100 million speakers worldwide. English, on the other hand, is official language of Pakistan and commonly mixed with Urdu in daily communication. Urdu, being under-resourced language, have no substantial Urdu-English code-switched corpus in hand to develop speech recognition system. In this research, readily available spontaneous Urdu speech corpus (25 hours) is revised to use it for enhancement of read speech Urdu LVCSR to recognize code-switched speech. This data set is split into 20 hours of train and 5 hours of test set. 10 hours of Urdu BroadCast (BC) data are collected and annotated in a semi-supervised way to enhance the system further. For acoustic modeling, state-of-the-art DNN-HMM modeling technique is used without any prior GMM-HMM training and alignments. Various techniques to improve language model using monolingual data are investigated. The overall percent Word Error Rate (WER) is reduced from 40.71% to 26.95% on test set.

引用

页码：155 / 159

页数：5

共 50 条

[21] Code-switched automatic speech recognition in five South African languages
Biswas, Astik
Yilmaz, Emre
van der Westhuizen, Ewald
de Wet, Febe
Niesler, Thomas
COMPUTER SPEECH AND LANGUAGE, 2022, 71
[22] TRANSLITERATION BASED APPROACHES TO IMPROVE CODE-SWITCHED SPEECH RECOGNITION PERFORMANCE
Emond, Jesse
Ramabhadran, Bhuvana
Roark, Brian
Moreno, Pedro
Ma, Min
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 448 - 455
[23] Vietnamese Large Vocabulary Continuous Speech Recognition
Ngoc Thang Vu
Schultz, Tanja
2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 333 - 338
[24] Advances in large vocabulary continuous speech recognition
Zweig, G
Picheny, M
ADVANCES IN COMPUTERS, VOL. 60: INFORMATION SECURITY, 2004, 60 : 249 - 291
[25] Improving Large Vocabulary Urdu Speech Recognition System using Deep Neural Networks
Farooq, Muhammad Umar
Adeeba, Farah
Rauf, Sahar
Hussain, Sarmad
INTERSPEECH 2019, 2019, : 2978 - 2982
[26] Towards speech rate independence in large vocabulary continuous speech recognition
Martinez, F
Tapias, D
Alvarez, J
PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 725 - 728
[27] Training Hybrid Models on Noisy Transliterated Transcripts for Code-Switched Speech Recognition
Wiesner, Matthew
Sarma, Mousmita
Arora, Ashish
Raj, Desh
Gao, Dongji
Huang, Ruizhe
Preet, Supreet
Johnson, Moris
Iqbal, Zikra
Goel, Nagendra
Trmal, Jan
Garcia, Paola
Khudanpur, Sanjeev
INTERSPEECH 2021, 2021, : 2906 - 2910
[28] COMPARISON OF DATA AUGMENTATION AND ADAPTATION STRATEGIES FOR CODE-SWITCHED AUTOMATIC SPEECH RECOGNITION
Ma, Min
Ramabhadran, Bhuvana
Emond, Jesse
Rosenberg, Andrew
Biadsy, Fadi
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6081 - 6085
[29] Speaker adaptation in the philips system for large vocabulary continuous speech recognition
Thelen, E
Aubert, X
Beyerlein, P
1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1035 - 1038
[30] RAPID BOOTSTRAPPING OF A UKRAINIAN LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION SYSTEM
Schlippe, Tim
Volovyk, Mykola
Yurchenko, Kateryna
Schultz, Tanja
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7329 - 7333

← 1 2 3 4 5 →