Enhancing Large Vocabulary Continuous Speech Recognition System for Urdu-English Conversational Code-Switched Speech

被引：0

作者：

Farooq, Muhammad Umar ^{[1
]}

Adeeba, Farah ^{[1
]}

Hussain, Sarmad ^{[1
]}

Rauf, Sahar ^{[1
]}

Khalid, Maryam ^{[1
]}

机构：

[1] Univ Engn & Technol, Ctr Language Engn, Al Khawarizmi Inst Comp Sci, Lahore, Pakistan

来源：

PROCEEDINGS OF 2020 23RD CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (ORIENTAL-COCOSDA 2020) | 2020年

关键词：

Urdu-English code-switching; Urdu speech recognition; under-resourced language;

D O I：

10.1109/o-cocosda50338.2020.9295036

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents first step towards Large Vocabulary Continuous Speech Recognition (LVCSR) system for Urdu-English code-switched conversational speech. Urdu is the national language and lingua franca of Pakistan, with 100 million speakers worldwide. English, on the other hand, is official language of Pakistan and commonly mixed with Urdu in daily communication. Urdu, being under-resourced language, have no substantial Urdu-English code-switched corpus in hand to develop speech recognition system. In this research, readily available spontaneous Urdu speech corpus (25 hours) is revised to use it for enhancement of read speech Urdu LVCSR to recognize code-switched speech. This data set is split into 20 hours of train and 5 hours of test set. 10 hours of Urdu BroadCast (BC) data are collected and annotated in a semi-supervised way to enhance the system further. For acoustic modeling, state-of-the-art DNN-HMM modeling technique is used without any prior GMM-HMM training and alignments. Various techniques to improve language model using monolingual data are investigated. The overall percent Word Error Rate (WER) is reduced from 40.71% to 26.95% on test set.

引用

页码：155 / 159

页数：5

共 50 条

[1] The 2001 BYBLOS english large vocabulary conversational speech recognition system
Matsoukas, S
Colthurst, T
Kimball, O
Solomonoff, A
Richardson, F
Quillen, C
Gish, H
Dognin, P
2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 721 - 724
[2] Phone Merging for Code-switched Speech Recognition
Sivasankaran, Sunit
Srivastava, Brij Mohan Lal
Sitaram, Sunayana
Bali, Kalika
Choudhury, Monojit
COMPUTATIONAL APPROACHES TO LINGUISTIC CODE-SWITCHING, 2018, : 11 - 19
[3] TRANSFORMER-TRANSDUCERS FOR CODE-SWITCHED SPEECH RECOGNITION
Dalmia, Siddharth
Liu, Yuzong
Ronanki, Srikanth
Kirchhoff, Katrin
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5859 - 5863
[4] Homophone Identification and Merging for Code-switched Speech Recognition
Srivastava, Brij Mohan Lal
Sitara, Sunayana
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1943 - 1947
[5] Two sepedi-english code-switched speech corpora
Modipa, Thipe, I
Davel, Marelie H.
LANGUAGE RESOURCES AND EVALUATION, 2022, 56 (03) : 703 - 727
[6] Automatic Speech Recognition of English-isiZulu Code-switched Speech from South African Soap Operas
van der Westhuizen, Ewald
Niesler, Thomas
SLTU-2016 5TH WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGIES FOR UNDER-RESOURCED LANGUAGES, 2016, 81 : 121 - 127
[7] Two sepedi-english code-switched speech corpora
Thipe I. Modipa
Marelie H. Davel
Language Resources and Evaluation, 2022, 56 : 703 - 727
[8] Meta-Transfer Learning for Code-Switched Speech Recognition
Winata, Genta Indra
Cahyawijaya, Samuel
Lin, Zhaojiang
Liu, Zihan
Xu, Peng
Fung, Pascale
58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3770 - 3776
[9] The RWTH large vocabulary continuous speech recognition system
Ney, H
Welling, L
Ortmanns, S
Beulen, K
Wessel, F
PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 853 - 856
[10] A Myanmar Large Vocabulary Continuous Speech Recognition System
Naing, Hay Mar Soe
Hlaing, Aye Mya
Pa, Win Pa
Hu, Xinhui
Thu, Ye Kyaw
Hori, Chiori
Kawai, Hisashi
2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 320 - 327

← 1 2 3 4 5 →