Enhancing Large Vocabulary Continuous Speech Recognition System for Urdu-English Conversational Code-Switched Speech

被引:0
|
作者
Farooq, Muhammad Umar [1 ]
Adeeba, Farah [1 ]
Hussain, Sarmad [1 ]
Rauf, Sahar [1 ]
Khalid, Maryam [1 ]
机构
[1] Univ Engn & Technol, Ctr Language Engn, Al Khawarizmi Inst Comp Sci, Lahore, Pakistan
关键词
Urdu-English code-switching; Urdu speech recognition; under-resourced language;
D O I
10.1109/o-cocosda50338.2020.9295036
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents first step towards Large Vocabulary Continuous Speech Recognition (LVCSR) system for Urdu-English code-switched conversational speech. Urdu is the national language and lingua franca of Pakistan, with 100 million speakers worldwide. English, on the other hand, is official language of Pakistan and commonly mixed with Urdu in daily communication. Urdu, being under-resourced language, have no substantial Urdu-English code-switched corpus in hand to develop speech recognition system. In this research, readily available spontaneous Urdu speech corpus (25 hours) is revised to use it for enhancement of read speech Urdu LVCSR to recognize code-switched speech. This data set is split into 20 hours of train and 5 hours of test set. 10 hours of Urdu BroadCast (BC) data are collected and annotated in a semi-supervised way to enhance the system further. For acoustic modeling, state-of-the-art DNN-HMM modeling technique is used without any prior GMM-HMM training and alignments. Various techniques to improve language model using monolingual data are investigated. The overall percent Word Error Rate (WER) is reduced from 40.71% to 26.95% on test set.
引用
收藏
页码:155 / 159
页数:5
相关论文
共 50 条
  • [21] Code-switched automatic speech recognition in five South African languages
    Biswas, Astik
    Yilmaz, Emre
    van der Westhuizen, Ewald
    de Wet, Febe
    Niesler, Thomas
    COMPUTER SPEECH AND LANGUAGE, 2022, 71
  • [22] TRANSLITERATION BASED APPROACHES TO IMPROVE CODE-SWITCHED SPEECH RECOGNITION PERFORMANCE
    Emond, Jesse
    Ramabhadran, Bhuvana
    Roark, Brian
    Moreno, Pedro
    Ma, Min
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 448 - 455
  • [23] Vietnamese Large Vocabulary Continuous Speech Recognition
    Ngoc Thang Vu
    Schultz, Tanja
    2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 333 - 338
  • [24] Advances in large vocabulary continuous speech recognition
    Zweig, G
    Picheny, M
    ADVANCES IN COMPUTERS, VOL. 60: INFORMATION SECURITY, 2004, 60 : 249 - 291
  • [25] Improving Large Vocabulary Urdu Speech Recognition System using Deep Neural Networks
    Farooq, Muhammad Umar
    Adeeba, Farah
    Rauf, Sahar
    Hussain, Sarmad
    INTERSPEECH 2019, 2019, : 2978 - 2982
  • [26] Towards speech rate independence in large vocabulary continuous speech recognition
    Martinez, F
    Tapias, D
    Alvarez, J
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 725 - 728
  • [27] Training Hybrid Models on Noisy Transliterated Transcripts for Code-Switched Speech Recognition
    Wiesner, Matthew
    Sarma, Mousmita
    Arora, Ashish
    Raj, Desh
    Gao, Dongji
    Huang, Ruizhe
    Preet, Supreet
    Johnson, Moris
    Iqbal, Zikra
    Goel, Nagendra
    Trmal, Jan
    Garcia, Paola
    Khudanpur, Sanjeev
    INTERSPEECH 2021, 2021, : 2906 - 2910
  • [28] COMPARISON OF DATA AUGMENTATION AND ADAPTATION STRATEGIES FOR CODE-SWITCHED AUTOMATIC SPEECH RECOGNITION
    Ma, Min
    Ramabhadran, Bhuvana
    Emond, Jesse
    Rosenberg, Andrew
    Biadsy, Fadi
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6081 - 6085
  • [29] Speaker adaptation in the philips system for large vocabulary continuous speech recognition
    Thelen, E
    Aubert, X
    Beyerlein, P
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1035 - 1038
  • [30] RAPID BOOTSTRAPPING OF A UKRAINIAN LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION SYSTEM
    Schlippe, Tim
    Volovyk, Mykola
    Yurchenko, Kateryna
    Schultz, Tanja
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7329 - 7333