Enhancing Large Vocabulary Continuous Speech Recognition System for Urdu-English Conversational Code-Switched Speech

被引：0

作者：

Farooq, Muhammad Umar ^{[1
]}

Adeeba, Farah ^{[1
]}

Hussain, Sarmad ^{[1
]}

Rauf, Sahar ^{[1
]}

Khalid, Maryam ^{[1
]}

机构：

[1] Univ Engn & Technol, Ctr Language Engn, Al Khawarizmi Inst Comp Sci, Lahore, Pakistan

来源：

PROCEEDINGS OF 2020 23RD CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (ORIENTAL-COCOSDA 2020) | 2020年

关键词：

Urdu-English code-switching; Urdu speech recognition; under-resourced language;

D O I：

10.1109/o-cocosda50338.2020.9295036

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents first step towards Large Vocabulary Continuous Speech Recognition (LVCSR) system for Urdu-English code-switched conversational speech. Urdu is the national language and lingua franca of Pakistan, with 100 million speakers worldwide. English, on the other hand, is official language of Pakistan and commonly mixed with Urdu in daily communication. Urdu, being under-resourced language, have no substantial Urdu-English code-switched corpus in hand to develop speech recognition system. In this research, readily available spontaneous Urdu speech corpus (25 hours) is revised to use it for enhancement of read speech Urdu LVCSR to recognize code-switched speech. This data set is split into 20 hours of train and 5 hours of test set. 10 hours of Urdu BroadCast (BC) data are collected and annotated in a semi-supervised way to enhance the system further. For acoustic modeling, state-of-the-art DNN-HMM modeling technique is used without any prior GMM-HMM training and alignments. Various techniques to improve language model using monolingual data are investigated. The overall percent Word Error Rate (WER) is reduced from 40.71% to 26.95% on test set.

引用

页码：155 / 159

页数：5

共 50 条

[31] Parallel Scalability in Speech Recognition Inference engines in large vocabulary continuous speech recognition
You, Kisun
Chong, Jike
Yi, Youngmin
Gonina, Ekaterina
Hughes, Christopher J.
Chen, Yen-Kuang
Sung, Wonyong
Keutzer, Kurt
IEEE SIGNAL PROCESSING MAGAZINE, 2009, 26 (06) : 124 - 135
[32] Developments in large vocabulary, continuous speech recognition of German
AddaDecker, M
Adda, G
Lamel, L
Gauvain, JL
1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 153 - 156
[33] Utilizing Lipreading in Large Vocabulary Continuous Speech Recognition
Palecek, Karel
SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 767 - 776
[34] Combating Reverberation in Large Vocabulary Continuous Speech Recognition
Mitra, Vikramjit
Van Hout, Julien
McLaren, Mitchell
Wang, Wen
Graciarena, Martin
Vergyri, Dimitra
Franco, Horacio
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2449 - 2453
[35] Speech recognition on Mandarin Call Home: A large-vocabulary, conversational, and telephone speech corpus
Liu, FH
Picheny, M
Srinivasa, P
Monkowski, M
Chen, JL
1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 157 - 160
[36] Accent Issues in Large Vocabulary Continuous Speech Recognition
Chao Huang
Tao Chen
Eric Chang
International Journal of Speech Technology, 2004, 7 (2-3) : 141 - 153
[37] Experimenting with lipreading for large vocabulary continuous speech recognition
Palecek, Karel
JOURNAL ON MULTIMODAL USER INTERFACES, 2018, 12 (04) : 309 - 318
[38] Confidence measures for large vocabulary continuous speech recognition
Wessel, F
Schlüter, R
Macherey, K
Ney, H
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (03): : 288 - 298
[39] CONNECTIONIST APPROACHES TO LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
SAWAI, H
MINAMI, Y
MIYATAKE, M
WAIBEL, A
SHIKANO, K
IEICE TRANSACTIONS ON COMMUNICATIONS ELECTRONICS INFORMATION AND SYSTEMS, 1991, 74 (07): : 1834 - 1844
[40] Boosting systems for large vocabulary continuous speech recognition
Saon, George
Soltau, Hagen
SPEECH COMMUNICATION, 2012, 54 (02) : 212 - 218

← 1 2 3 4 5 →